Loose index scan in MySQL can really help optimizing “group by” queries in some cases (for example, if you have only min() and/or max() as your aggregate functions). For example, if you have this query (to find maximum delay for all US flights with departure on Sundays in 2010):
select max(DepDelayMinutes), carrier, dayofweek from ontime_2010 where dayofweek = 7 group by Carrier, dayofweek
the usual case will be adding a covered index on (dayofweek, Carrier, DepDelayMinutes). And MySQL will use this index fine (using index mean it will use the covered index):
mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 where dayofweek =7 group by Carrier, dayofweek\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_2010 type: ref possible_keys: covered key: covered key_len: 2 ref: const rows: 905138 Extra: Using where; Using index 1 row in set (0.00 sec)
However, as the dayofweek part has low number of unique values, mysql will have to scan a lots of index entries (estimated rows: 905138).
MySQL can use loose index scan. Unfortunately, a o lots of limitations apply:
- The query is over a single table.
- The GROUP BY names only columns that form a leftmost prefix of the index and no other columns.
- The only aggregate functions used in the select list (if any) are MIN() and MAX(), same column
- etc… (see docs for details)
As our example query is suitable for loose index scan, we can create another index:
mysql> alter table ontime_2010 add key lis1(Carrier, dayofweek, DepDelayMinutes); mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 where dayofweek =7 group by Carrier, dayofweek \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_2010 type: range possible_keys: DayOfWeek,covered key: lis1 key_len: 5 ref: NULL rows: 19 Extra: Using where; Using index for group-by 1 row in set (0.01 sec)
Here, Using index for group-by, means that MySQL uses loose index scan.
Also, and it is really great, it works with range on dayofweek too:
mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 where dayofweek > 3 group by Carrier, dayofweek \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_2010 type: range possible_keys: DayOfWeek,covered key: lis1 key_len: 5 ref: NULL rows: 19 Extra: Using where; Using index for group-by 1 row in set (0.00 sec)
And original covered index does not work with ranges in where clause:
mysql> explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2010 use index (covered) where dayofweek > 3 group by Carrier, dayofweek \G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: ontime_2010 type: range possible_keys: covered key: covered key_len: 2 ref: NULL rows: 2416543 Extra: Using where; Using index; Using temporary; Using filesort
In the above example, MySQL uses index but still have to create temporary table and filesort.
Now speed comparison:
I’m using “ontime” flight performance statistics data from transtats.bts.gov
The table only consist of data for 2010.
Table size: 5M rows, ~2G in size. Table structure:
CREATE TABLE `ontime_2010` ( `YearD` int(11) DEFAULT NULL, `MonthD` tinyint(4) DEFAULT NULL, `DayofMonth` tinyint(4) DEFAULT NULL, `DayOfWeek` tinyint(4) DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `Origin` char(5) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, ... more fields here ... ) ENGINE=InnoDB DEFAULT CHARSET=latin1
Results (cached index and data):
“where dayofweek = 7” (ref)
- Loose index scan: 0 sec
- Covered index: 0.6 sec
“where dayofweek > 3” (range)
- Loose index scan: 0 sec
- index (+ filesort): 5.53 sec
I will also present the findings from this article (among other things) during the upcoming webinar, Getting the Best MySQL Performance in Your Products: Part 3, Query Tuning, which will take place on Tuesday, Nov 23 (webinar is free, webinar registration)