MySQL Order By Limit Optimization and Priority Queue Thresholds

When executing SELECT statements combining ORDER BY and LIMIT in MySQL, developers may encounter nondeterministic result sets if the sorting column contains duplicate values. This behavior stems from internal optimization strategies employed by the query optimizer, specifically regarding when to utilize a priority queue versus a standard filesort.

Official documentation highlights this phenomenon using a sample table. Consider a scenario where a table named performance_metrics is created to store evaluation data.

CREATE TABLE `performance_metrics` (
  `record_id` int NOT NULL AUTO_INCREMENT,
  `tier` int DEFAULT NULL,
  `score_value` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`record_id`)
) ENGINE=InnoDB;

Populating the table with initial dataset entries:

INSERT INTO `performance_metrics`(`record_id`, `tier`, `score_value`) VALUES 
(1, 1, '8.5'),
(2, 3, '9.0'),
(3, 2, '7.7'),
(4, 2, '7.5'),
(5, 1, '7.2'),
(6, 2, '7.5'),
(7, 3, '6.7');

Executing a query sorted by the tier column without a limit returns all rows ordered by the tier value. However, adding a LIMIT 5 clause may alter the sequence of records that share the same tier value. For instance, records with identical tier values might appear in a different order compared to the full sort result. This inconsistency occurs because the optimizer switches algorithms based on the estimated cost and row count.

To understand the tipping point where this behavior manifests, one can analyze the decision logic within the engine. Experiments indicate that the behavior changes when the total number of rows reaches a specific threshold relative to the limit value.

Consider a scenario where multiple rows with the same tier value are added. If the table contains 15 rows matching the criteria, the optimizer might choose a standard filesort. However, inserting a 16th row often triggers a switch to a priority queue optimization.

-- Adding rows to reach the threshold
INSERT INTO `performance_metrics`(`record_id`, `tier`, `score_value`) VALUES 
(8, 2, '7.2'),
(9, 2, '7.2'),
(10, 2, '7.2'),
(11, 2, '7.2'),
(12, 2, '7.2'),
(13, 2, '7.2'),
(14, 2, '7.2'),
(15, 2, '7.2');

With 15 rows, the execution plan typically favors a standard sort. Adding one more record changes the outcome:

INSERT INTO `performance_metrics`(`record_id`, `tier`, `score_value`) VALUES (16, 2, '7.2');

Upon executing SELECT * FROM performance_metrics ORDER BY tier LIMIT 5; after this insertion, the result set order may shift, bringing the newly inserted record into the top 5 results despite having the same sort key as others.

To diagnose this decision process, the optimizer_trace facility provides visibility into the internal logic. Enabling the trace reveals the filesort_priority_queue_optimization section.

SET optimizer_trace='enabled=on';
SELECT * FROM performance_metrics ORDER BY tier LIMIT 5;
SELECT * FROM `information_schema`.`OPTIMIZER_TRACE`;
SET optimizer_trace='enabled=off';

In the trace output, the chosen field within filesort_priority_queue_optimization indicates whether the priority queue was selected. When the row count is lower, the trace often shows chosen: false with the reason sort_is_cheaper. When the threshold is crossed, chosen becomes true.

This logic is defined in the MySQL source code, specifically within filesort.cc in the check_if_pq_applicable function. The decision relies on a comparison between the limit value and the total row count, scaled by a constant factor known as PQ_slowness.

The condition generally follows this formula:

if (param->max_rows < num_rows / PQ_slowness)

Here, max_rows corresponds to the LIMIT value, num_rows is the total qualifying rows, and PQ_slowness is a constant typically set to 3. This constant represents the performance ratio where quicksort is estimated to be three times faster than heap sort for full sorting. Consequently, if the limits small enough relative to the total rows (specifically when num_rows exceeds limit * 3), the engine opts for the priority queue (heap sort) to avoid sorting the entire dataset.

Using the previous example where LIMIT 5:

  • If num_rows is 15, 15 / 3 = 5. The condition 5 < 5 is false. Standard sort is used.
  • If num_rows is 16, 16 / 3 = 5.33. The condition 5 < 5.33 is true. Priority queue is used.

When the priority queue is engaged, MySQL utilizes a heap sort algorithm to maintain the top N elements. Heap sort is inherently unstable. Stability in sorting algorithms ensures that records with equal keys maintain their original relative order. Since heap sort does not guarantee this, records with identical tier values may be reordered during the extraction phase from the heap.

For example, if records with IDs 3, 4, and 16 all share the same tier value, the heap construction and extraction process might return them in the order 16, 3, 4 instead of their insertion order. This explains the nondeterministic appearance when the optimization switches.

Additionally, the optimizer_trace output includes a sort_mode field, which describes how the sort buffer is utilized. Common modes include:

  1. <sort_key, rowid>: The buffer stores the sort key and the primary key. After sorting, the engine performs a lookup to retrieve full row data.
  2. <sort_key, additional_fields>: The buffer stores the sort key and all required query columns, avoiding a lookup step.
  3. <sort_key, packed_additional_fields>: Similar to the second mode but optimizes storage by packing variable-length fields tightly.

Understanding these internal mechanisms clarifies why ORDER BY clauses should ideally include a unique column (such as the primary key) as a secondary sort key to ensure deterministic results across different execution plans and data volumes.

-- Ensuring deterministic order
SELECT * FROM performance_metrics ORDER BY tier, record_id LIMIT 5;

This approach forces a stable sort order regardless of whether the optimizer chooses a filesort or a priority queue implementation.

Tags: MySQL database Optimization sql internals

Posted on Fri, 08 May 2026 17:50:23 +0000 by Mikkki