- What we can learn about databases from a book library
- What are database indexes?
- Summary and next steps
A database is like a library. Books in a library are usually well organized, much like data in a database. Both provide great structures for storing vast amounts of information for later retrieval.
Continuing with the library analogy, libraries store large amounts of data. The more books you have, the longer the time it takes to find a book. The time taken to retrieve a book can be influenced by the strategy for storing and searching books in the library.
You may naively organize the books into one big shelf alphabetically according to the title, for example. One strategy for retrieving a book faster would be storing them in smaller bookshelves in alphabetical ranges.
This is similar to partitioning a table in a database. A table is split into smaller chunks by a partition key which is included in a query. A partition key is the column used to split the table. The database narrows down the data to be queried using the partition key.
Another strategy for faster book retrieval would be to ask more people to join your search and distribute their work in a structured way. Some databases support a similar pattern where the database server can assign more compute or CPU cores to execute a query. This is known as parallel query execution.
Another pattern you could use to retrieve a book is using a catalog. The catalog is an external, organized structure separate from the existing shelves containing the metadata of the books in the library. The metadata contains information such as titles, authors, subjects, publication dates, and the book's location. Databases provide similar structures called indexes.
The common denominator for all three strategies is retrieving data faster by reducing the scope of the search.
The rest of the article will explore database indexes: what they are and how they work, benefits, cost, the anatomy of a query, and how to identify slow database queries.
A database index is a smaller, secondary data structure used by the database server to store a subset of a table's data. Indexes are commonly used to improve the read performance of a given table.
Indexes contain key-value pairs:
- key: the column(s) that will be used to create an index
- value: and a pointer to the record in the specific table
You can index more than one column in a table. For example, if you have a table called
User with 3 columns:
lastName, you can create one index on the
The length of an index may be identical to the original column in the table but indexes have a smaller width.
There are different types of indexes, each one suitable for a different use case. Some of the general categories of indexes include but are not limited to:
- Index (default): a normal, non-unique index that does not enforce any constraints on your data
- Primary keys: used to uniquely identify a row in a table
- Unique indexes: used to enforce uniqueness of a value in a column, preventing duplicate values
- Full-text indexes: used on text columns and enables full-text search
There are more specialized types of indexes different databases support. Some of the other types include hash indexes, GIN indexes for PostgreSQL, and clustered indexes for SQL server.
The tutorial series will cover more specialized indexes that Prisma supports in future articles. Stay tuned!
When the database receives a query, it first creates a query plan. A query plan is a list of instructions the database needs to follow to execute a given query.
The query plan specifies how the database intends to retrieve records. The database chooses the most optimal query strategy.
You can prefix your SQL query with
EXPLAIN to see information about your database's query plan. Josh Berkus gave a presentation on "Explaining EXPLAIN" if you are interested in learning more about the PostgreSQL query planner.
One path the database might choose is look through each row in a table to find the records matching the filters. This is known as a sequential scan or full table scan.
If you're working with a large dataset, sequential scans will slow down your queries. The time taken to find a record is influenced by the size of your dataset. The bigger the dataset, the longer it takes to retrieve a record.
Another path the database might choose is query the index, matching the filter with the values available in the index. The database then "consults" the original table for more data such as more columns. This is known as an index scan.
The database might also choose to return the matching records from the index without even "consulting" the original table. This is known as an index-only scan. This can happen when the columns being returned and being filtered on already exist in the index. Here's an example that could make use of an index-only scan.
SELECT firstName from 'User' where firstName = 'Jimmy';
The universal consensus is that indexes are great for speeding up read performance. However, indexes come at a cost.
You will incur additional overhead for your write operations. This is because every write requires the index to be updated as well.
The other cost of indexes is that they require additional resources from the database server for maintenance. Indexes require additional storage, memory, and IO from the database server.
A general rule of thumb is to use indexes sparingly and on columns that are queried frequently. You should also select the appropriate index type based on your requirements.
In this article, you learned what a database index is, the different types of database indexes, the anatomy of a database query, and the cost of using database indexes to optimize your queries.
In the next article, you will learn how you can improve the performance of your existing query using indexes in your application with Prisma.