Maintaining the performance of your database systems is an essential part of optimizing the responsiveness of your database-backed applications. This involves managing server's hardware resources and software configuration, but it also pertains to the queries that you are asking it to execute.
In a previous guide, we examined how to identify low performing and slow queries in MySQL. In this guide, we will take the next step by discussing how to evaluate and optimize slow running queries. This will build off of the discussion in the previous article to give you a full strategy for identifying and fixing issues associated with poorly performing queries.
One of the first things you may want to do when trying to optimize a slow query is to profile the query to discover where exactly the query is taking the most time.
First, check if query profiling is enabled by typing:
+-------------+| @@profiling |+-------------+| 0 |+-------------+1 row in set, 1 warning (0.00 sec)
The 0 indicates that profiling is currently disabled. You can enable profiling by typing:
SET PROFILING = 1;
Once profiling is enabled, you can start to list the available profiles by typing:
+----------+------------+--------------------------+| Query_ID | Duration | Query |+----------+------------+--------------------------+| 1 | 3.00368075 | select sleep(3) || 2 | 0.00740700 | select * from mysql.user || 3 | 0.00075875 | select @@profiling |+----------+------------+--------------------------+3 rows in set, 1 warning (0.00 sec)
The output will show you the most recent statements that have been executed in the current session since profiling was enabled. Each query shows the total execution time and has a query number that can be used to dive in deeper.
To show the profiling information for the most recent query, type:
+----------------------+----------+| Status | Duration |+----------------------+----------+| starting | 0.000144 || checking permissions | 0.000020 || Opening tables | 0.000024 || init | 0.000013 || optimizing | 0.000047 || executing | 0.000032 || end | 0.000010 || query end | 0.000016 || closing tables | 0.000012 || freeing items | 0.000367 || cleaning up | 0.000074 |+----------------------+----------+11 rows in set, 1 warning (0.00 sec)
By default, the output will show you exactly how long the statement was in each stage of query processing. This can help you identify exactly which portion of the query is performing poorly.
To display the results associated with a different query, you can add the
FOR QUERY <N> clause along with the query ID provided by the
SHOW PROFILES output.
For instance, to get the default display for query 1, you can type:
SHOW PROFILE FOR QUERY 1;
+----------------------+----------+| Status | Duration |+----------------------+----------+| starting | 0.001419 || checking permissions | 0.000028 || Opening tables | 0.000030 || init | 0.000020 || optimizing | 0.000020 || executing | 0.000025 || User sleep | 3.000165 || end | 0.000043 || query end | 0.000019 || closing tables | 0.000012 || freeing items | 0.000540 || logging slow query | 0.001320 || cleaning up | 0.000041 |+----------------------+----------+13 rows in set, 1 warning (0.00 sec)
Here, the output clearly shows a long time spent with the
User sleep status, which directly manages the function that was executed.
You can also change the output to display different types of information. This is often helpful once you have viewed the default output in order to check more specific information.
For instance, you can see information about the first query's block input and output by typing:
SHOW PROFILE BLOCK IO FOR QUERY 1;
+----------------------+----------+--------------+---------------+| Status | Duration | Block_ops_in | Block_ops_out |+----------------------+----------+--------------+---------------+| starting | 0.001419 | 184 | 0 || checking permissions | 0.000028 | 0 | 0 || Opening tables | 0.000030 | 0 | 0 || init | 0.000020 | 0 | 0 || optimizing | 0.000020 | 0 | 0 || executing | 0.000025 | 0 | 0 || User sleep | 3.000165 | 0 | 0 || end | 0.000043 | 0 | 0 || query end | 0.000019 | 0 | 0 || closing tables | 0.000012 | 0 | 0 || freeing items | 0.000540 | 0 | 0 || logging slow query | 0.001320 | 64 | 8 || cleaning up | 0.000041 | 0 | 0 |+----------------------+----------+--------------+---------------+13 rows in set, 1 warning (0.00 sec)
In addition to the previous information, this shows times where MySQL had to interact with block data on the filesystem.
The available output variations you can specify include:
ALL: Displays all available information
BLOCK IO: Displays number of blocks of input and output the query generated
CONTEXT SWITCHES: Shows number of voluntary and involuntary context switches that occurred while executing the query
CPU: Shows the CPU time in userspace and system categories
IPC: Displays the number of interprocess messages sent and received
PAGE FAULTS: Shows the number of page faults, segmented into major and minor faults
SOURCE: Shows the actual functions being executed within MySQL along with their source file and line number to allow execution mapping
SWAPS: Shows the number of swaps the query required
To show all possible information associated with a query, type:
SHOW PROFILE ALL FOR QUERY 1\G
We trigger the alternate output formatting with the
\G statement termination sequence to make the results more readable since a lot of additional output is displayed.
Once you have used the profiling information to develop a good idea about what part of the query is slow, you can turn profiling back off to preserve resources:
SET PROFILING = 0;
The querying profile should help you understand the stages of query execution that are taking the longest to execute as well as what resources are associated with the query. You can then use MySQL's
EXPLAIN statement to understand the query optimizer's evaluation of the query.
EXPLAIN statement takes a
REPLACE statement and displays how the query optimizer would evaluate and execute the given query. The output shows how MySQL would join tables, select fields, sort and filter results, and so on. It also shows things like how many rows would be evaluated and what indexes would be consulted to speed up the process.
To get an idea of how this statement works, ask MySQL to explain how it would execute the query
SELECT * FROM INFORMATION_SCHEMA.VIEWS;:
EXPLAIN SELECT * FROM information_schema.views\G
*************************** 1. row ***************************id: 1select_type: SIMPLEtable: catpartitions: NULLtype: indexpossible_keys: PRIMARYkey: namekey_len: 194ref: NULLrows: 1filtered: 100.00Extra: Using index*************************** 2. row ***************************id: 1select_type: SIMPLEtable: vwpartitions: NULLtype: refpossible_keys: schema_id,type,view_client_collation_id,view_connection_collation_id,type_2key: typekey_len: 1ref: constrows: 100filtered: 100.00Extra: Using where*************************** 3. row ***************************id: 1select_type: SIMPLEtable: schpartitions: NULLtype: eq_refpossible_keys: PRIMARY,catalog_idkey: PRIMARYkey_len: 8ref: mysql.vw.schema_idrows: 1filtered: 100.00Extra: Using where*************************** 4. row ***************************id: 1select_type: SIMPLEtable: conn_collpartitions: NULLtype: eq_refpossible_keys: PRIMARYkey: PRIMARYkey_len: 8ref: mysql.vw.view_connection_collation_idrows: 1filtered: 100.00Extra: NULL*************************** 5. row ***************************id: 1select_type: SIMPLEtable: client_collpartitions: NULLtype: eq_refpossible_keys: PRIMARY,character_set_idkey: PRIMARYkey_len: 8ref: mysql.vw.view_client_collation_idrows: 1filtered: 100.00Extra: NULL*************************** 6. row ***************************id: 1select_type: SIMPLEtable: cspartitions: NULLtype: eq_refpossible_keys: PRIMARYkey: PRIMARYkey_len: 8ref: mysql.client_coll.character_set_idrows: 1filtered: 100.00Extra: NULL6 rows in set, 1 warning (0.01 sec)
The above output shows that that six separate queries are required to pull together the information for that query. If you look at the table definition, you can see why:
SHOW CREATE TABLE information_schema.views\G
Modified for formatting, the resulting table creation command looks like this:
CREATE algorithm=undefined definer=`mysql.infoschema`@`localhost` SQL security definer view `views`ASSELECT `cat`.`name` AS `table_catalog`,`sch`.`name` AS `table_schema`,`vw`.`name` AS `table_name`,IF((can_access_view(`sch`.`name`,`vw`.`name`,`vw`.`view_definer`,`vw`.`options`) = TRUE),`vw`.`view_definition_utf8`,'') AS `view_definition`,`vw`.`view_check_option` AS `check_option`,`vw`.`view_is_updatable` AS `is_updatable`,`vw`.`view_definer` AS `definer`,IF((`vw`.`view_security_type` = 'DEFAULT'),'DEFINER',`vw`.`view_security_type`) AS `security_type`,`cs`.`name` AS `character_set_client`,`conn_coll`.`name` AS `collation_connection`FROM (((((`mysql`.`tables` `vw`JOIN `mysql`.`schemata` `sch`ON ((`vw`.`schema_id` = `sch`.`id`)))JOIN `mysql`.`catalogs` `cat`ON ((`cat`.`id` = `sch`.`catalog_id`)))JOIN `mysql`.`collations` `conn_coll`ON ((`conn_coll`.`id` = `vw`.`view_connection_collation_id`)))JOIN `mysql`.`collations` `client_coll`ON ((`client_coll`.`id` = `vw`.`view_client_collation_id`)))JOIN `mysql`.`character_sets` `cs`ON ((`cs`.`id` = `client_coll`.`character_set_id`)))WHERE ((0 <> can_access_table(`sch`.`name`,`vw`.`name`))AND (`vw`.`type` = 'VIEW'))
There are five join statements, meaning that six separate tables will have to be queried to construct the results. This matches the number of entries returned in the
EXPLAIN statement we executed — one per table.
To interpret the results of the
EXPLAIN statement, you need to understand what each of the individual columns stand for. Of particular interest are the
type column tells you the way that MySQL implements any join operations within the query. The MySQL documentation has a good description of the different join types available and how they may affect performance.
For instance, a type of
const will be very fast because it means that MySQL is comparing a unique column against a constant value, which should return at most a single row.
eq_ref type is similarly the fastest join operation for combining to two tables as it indicates that the query uses a complete unique, non-null index value for comparison.
index type is used when a full table scan is required, but instead of scanning the actual table, the query can scan the index tree instead. This makes it relatively low performance in the scope of the query, but still somewhat optimized because it can scan the smaller index instead of the actual table.
ref join type means that MySQL was able to use the leading columns of an index to compare (when the entire index contains multiple columns not used for the comparison). This means that MySQL is able to use the partial index efficiently even though it doesn't match the entire comparison.
key column and its related columns (
ref) help you understand what indexes were available, what indexes were used, and how efficient the index usage was.
. . .possible_keys: PRIMARYkey: PRIMARYkey_len: 8ref: mysql.client_coll.character_set_id. . .
possible_keys column tells us what indexes were available for the tables being joined. It is helpful because it shows us what MySQL started with when trying to find an index appropriate for satisfying the query. This might be a place to look if you are joining your tables on non-primary keys to ensure that you have indexes available to accelerate the process.
key column itself tells you which index MySQL decided to use to run the query. In some cases, this might even include results that were not listed in the
possible_keys column. For instance, when an index contains all of the columns the query asks for, it can be scanned more efficiently than the associated table, even though the index was not appropriate for (and is not used for) the actual join.
key_length column indicates how much of the actual index was used. For multi-column indexes, this can help you understand how much of the stored index was used for this particular comparison. The
ref column also plays a part here by telling you what specifically is being compared against the specific index. This can help you determine whether it is a good idea to create a more optimized index.
filtered columns are useful because they tell use the scope of the table being examined.
rows: 1filtered: 100.00
rows column gives the number of rows MySQL thinks it will have to evaluate to perform each part of the query. The number here is affected by all of the previous columns we've discussed. It gives us an indication of how efficiently MySQL can pare down the entire table given the statement provided and indexes available.
filtered column estimates the percentage of the rows evaluated that will be returned after any row filtering conditions within the query are applied. If the value is 100, it indicates that no row filtering will occur. Values below 100 indicate that some of the rows evaluated won't be returned because of join conditions or other filtering.
Now that we've discussed some of the ways that MySQL allows you to evaluate the performance of individual statements, we can talk about how to improve that performance.
The exact steps you'll want to consider depend largely on what you found with your query profiling and
EXPLAIN investigations. We will cover some of the most common ways to optimize query performance below.
One of the most straightforward ways of increasing the execution speed of your queries is to ensure that there are appropriate indexes available for your tables. Your indexes should reflect the most common ways you query for data and may change over time as your usage evolves or additional queries are implemented.
To create a new index, you can use the
CREATE INDEX statement:
CREATE INDEX <index_name> ON <table> (<columns_to_index>)
There are many different types of indexes you can create depending on the data type of the columns you're indexing, the maximum length of the values, and the index structure you want to use. These options may be worth looking into if your query is already using an index and the index is not performing well. However, to start, you can keep it relatively simple by using mostly default values.
The first considerations you'll need to make are what columns you'd like to index on. If you regularly join on specific columns, it is important to make sure they are accounted for in your indexes. Indexes are used primarily in row lookups, so join operations, where various rows in different tables are matched, are very expensive without index.
You can create indexes with a single column if it fully encapsulates how you typically query the table. You can also index on multiple columns to make the indexing more useful. If indexing on multiple columns, the ordering is very important. The first column should be the primary way to seek matches while any additional columns should help you refine those matches.
For example, if you create an index for
last_name for the table
person, you'll almost certainly want to define the index with something like this:
CREATE INDEX last_and_first_names ON person (last_name, first_name);
last_name column is the first or left-most column in the index, meaning it will be fast looking up either
Alan Smith or all
Smith entries. It won't be effective for looking up all instances of
Alan as the
first_name however, which is probably fine unless we anticipate querying only by first name often.
While creating indexes can help you increase the read performance of your datasets, it is important to not over-index your tables. Indexes come at a cost to write speed, as each time new values are recorded in the table, all of the indexes associated with it must also be updated. Tables work best if their primary keys are well fitted with the querying patterns that'll be used against them as it allows you to benefit from strong index performance without needing a large number of supplemental indexes to accommodate different access patterns.
Another way that you can speed up queries is by limiting the number of candidate and returned rows in your statements when possible. This is a very context-specific recommendation, so it won't be appropriate in many cases, but if you can pare down the number of rows that MySQL has to evaluate, you can increase performance significantly.
You can do this with the
LIMIT clause. For example, to return only five results, you can use a format like this:
SELECT * FROM <table> LIMIT 5
In many cases (depending on what other clauses are included in the query), limiting the number of rows returned can help MySQL short circuit the querying process, stopping sooner than it would normally need to. This helps save time early on instead of executing over the entire dataset and then truncating the results later.
This is especially useful when joining multiple tables. When at all possible, it is best to limit the results of your query before the join occurs. This allows MySQL to evaluate a small amount of data to the rest of the joins instead of the entirety of the first table.
Another way to increase performance in your queries is to pay attention to what columns you are actually returning. This can help speed up the results in a number of different ways.
The primary way that limiting the columns you select affects performance is through minimizing network payload. By returning columns that you don't need, larger queries can substantially increase the network traffic generated by a query. Selecting only the columns you need to satisfy the logical constraints of a query helps prevent this problem by paring down what is transferred over the network.
Another way that being selective with the columns you return can impact performance is related to indexes. If MySQL is able to associate all of the columns you're returning with an index, it can potentially operate on that smaller, more organized dataset instead of the full table. Limiting the columns you are querying allows for this to occur when your indexes match your query patterns well.
One thing that might negatively affect your query's performance is over-reliance on wildcards. In SQL, the
% character is used in comparisons as a wildcard indicating that any value can be substituted. This is a very powerful way of finding data within your records, but it can have very big impacts on query performance.
This is especially true for leading or prefix wildcards. Leading wildcards are comparisons that check whether a column value matches a pattern that begins with a wildcard. For instance, checking whether a column matches
%Main St.. This performs especially bad because MySQL is not able to use indexes to discover matching results. Instead, it must look through the entire table for matches, even if that column has an index.
In this particular example, if you will be regularly querying by a street name to discover all addresses on that street, it might make more sense to split up the street name and street numbers into two separate columns. This would allow you to query for addresses on "Main St." using an index quickly and then filter those results as necessary if you are looking for a specific address. Creating a compound index with the street name first and the street number column second would make this even faster.
In this guide, we walked through some of the basic ways of evaluating individual queries and getting more information about what might be slowing them down. We discussed profiling queries and using MySQL's
EXPLAIN statement to get detailed information about how the query planner evaluates the query. We then took a look at some ways that you can avoid these slow downs by modifying your queries to align better with the way the database retrieves data.
Databases are complex pieces of software that are responsible for operating on large amounts of data that can be in many different formats. In addition to simply retrieving the necessary data, there's an expectation that the database should be able to return results quickly. There are countless optimizations built into MySQL's query planner and optimizer mechanisms to help minimize the amount of time it takes to answer queries.
However, the database is only able to automatically optimize in a limited sense. As the user of the database, you need to also be able to understand the data being accessed, your requirements for a query, and which queries are most important to optimize for. For this, it is important to understand how to evaluate your criteria and make adjustments when the query performance is not at the expected level.