With the standard data warehousing queries we have a fact table and dimension tables and we join them.
For example, the fact table (Table size: 5M rows, ~2G in size) from my previous Loose index scan vs. covered indexes in MySQL post:
CREATE TABLE `ontime_2010` ( `YearD` int(11) DEFAULT NULL, `MonthD` tinyint(4) DEFAULT NULL, `DayofMonth` tinyint(4) DEFAULT NULL, `DayOfWeek` tinyint(4) DEFAULT NULL, `Carrier` char(2) DEFAULT NULL, `Origin` char(5) DEFAULT NULL, `DepDelayMinutes` int(11) DEFAULT NULL, `AirlineID` int(11) DEFAULT NULL, `Cancelled` tinyint(4) DEFAULT NULL, ... more fields here ... ) ENGINE=InnoDB DEFAULT CHARSET=latin1
(this is not the best possible fact table as the data is not aggregated by I’ll use it for now).
And we have those dimensions tables:
CREATE TABLE `airlines` ( `AirlineID` int(11) NOT NULL DEFAULT '0', `AirlineName` varchar(255) DEFAULT NULL, PRIMARY KEY (`AirlineID`), KEY `AirlineName` (`AirlineName`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 CREATE TABLE `date_dayofweek` ( `code` int(11) NOT NULL DEFAULT '0', `description` varchar(255) DEFAULT NULL, PRIMARY KEY (`code`), KEY `description` (`description`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> select * from date_dayofweek order by code; +------+-------------+ | code | description | +------+-------------+ | 1 | Monday | | 2 | Tuesday | | 3 | Wednesday | | 4 | Thursday | | 5 | Friday | | 6 | Saturday | | 7 | Sunday | | 9 | Unknown | +------+-------------+ 8 rows in set (0.00 sec)
So here is the example query (find sum of cancelled flights on Sundays for the given airline group by day):
select sum(Cancelled), FlightDate, AirlineName from ontime_2010 o, date_dayofweek dow, airlines a where o.dayofweek=dow.code and dow.description = 'Sunday' and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL' group by FlightDate order by FlightDate desc limit 10\G
To fix the query we can add a covered index for ontime_2010, so that all fields for ontime_2010 table will be covered:
alter table ontime_2010 add key cov2(AirlineID, dayofweek, FlightDate, Cancelled);
However we will still have “temporary table and filesort”:
To avoid filesort we can re-write this query with "subqueries":mysql> explain select sum(Cancelled), FlightDate from ontime_2010 o, date_dayofweek dow, airlines a where o.dayofweek=dow.code and dow.description = 'Sunday' and a.AirlineID = o.AirlineID and a.AirlineName = 'Delta Air Lines Inc.: DL' group by FlightDate order by FlightDate desc limit 10\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: dow type: ref possible_keys: PRIMARY,description key: description key_len: 258 ref: const rows: 1 Extra: Using where; Using index; Using temporary; Using filesort *************************** 2. row *************************** id: 1 select_type: SIMPLE table: a type: ref possible_keys: PRIMARY,AirlineName key: AirlineName key_len: 258 ref: const rows: 1 Extra: Using where; Using index *************************** 3. row *************************** id: 1 select_type: SIMPLE table: o type: ref possible_keys: DayOfWeek,covered,AirlineID,cov2 key: cov2 key_len: 7 ref: ontime.a.AirlineID,ontime.dow.code rows: 24417 Extra: Using where; Using index 3 rows in set (0.00 sec)
mysql> explain select sum(Cancelled), FlightDate from ontime_2010 o where o.dayofweek= (select code from date_dayofweek where description = 'Sunday') and AirlineID = (select AirlineID from airlines where AirlineName = 'Delta Air Lines Inc.: DL') group by FlightDate limit 10\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: o type: ref possible_keys: DayOfWeek,covered,AirlineID,cov2 key: cov2 key_len: 7 ref: const,const rows: 152510 Extra: Using where; Using index *************************** 2. row *************************** id: 3 select_type: SUBQUERY table: airlines type: ref possible_keys: AirlineName key: AirlineName key_len: 258 ref: rows: 1 Extra: Using where; Using index *************************** 3. row *************************** id: 2 select_type: SUBQUERY table: date_dayofweek type: ref possible_keys: description key: description key_len: 258 ref: rows: 1 Extra: Using where; Using index 3 rows in set (0.00 sec)
As MySQL will use indexes when we have "field = (select .. )" and now all fields in the index belong to the single table, MySQL will use index and avoid filesort. Please note: this will not work with "field in (select ...)" and also make sure that the subselect part will return only 1 row.