Lets say we have a table storing mail messages and we need to show user’s mailbox: messages sent “from” and “to” the specified user.
Here is our table:
CREATE TABLE `internalmail` ( `mail_id` int(10) NOT NULL AUTO_INCREMENT, `senderaddress_id` int(10) NOT NULL, `recipientaddress_id` int(10) NOT NULL, `mail_timestamp` timestamp NULL DEFAULT NULL, ... message body, etc ... PRIMARY KEY (`mail_id`), ) ENGINE=InnoDB DEFAULT CHARSET=latin1
And our query:
select * from internalmail where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0) and mail_timestamp > '2010-08-01 12:30:47' order by mail_timestamp desc
In this query we show all messages from and to user_id = 247 plus all messages to system user (user_id=0). We need to show only messages for the last 3 months and show the most recent messages first.
To speed up the query we can try creating indexes:
KEY `recipientaddress_id` (`recipientaddress_id`),
KEY `senderaddress_id` (`senderaddress_id`),
KEY `mail_timestamp` (`mail_timestamp`),
However, as the query uses “OR”, MySQL will use a filesort.
mysql> explain select * from internalmail where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0) and mail_timestamp > '2010-08-01 12:30:47' order by mail_timestamp desc\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: internalmail type: ALL possible_keys: recipientaddress_id,senderaddress_id,mail_timestamp key: NULL key_len: NULL ref: NULL rows: 4843257 Extra: Using where; Using filesort 1 row in set (0.00 sec)
UPDATE: even if we will create combined indexes on (recipientaddress_id,mail_timestamp) and/or (senderaddress_id,mail_timestamp) those indexes will not be used, as the query contains “OR” in the where clause.
And original query runs for 3 seconds. To fix this query we can do 2 things:
- Rewrite query with UNION instead of OR
- Create combined indexes
First, we rewrite query with UNION:
(select * from internalmail where senderaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47’)
union
(select * from internalmail where recipientaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47’)
union
(select * from internalmail where recipientaddress_id = 0 and mail_timestamp > ‘2010-08-19 12:30:47’)
order by mail_timestamp desc;
Second, we create 2 indexes:
mysql> alter table internalmail add key send_dt(senderaddress_id, mail_timestamp);
mysql> alter table internalmail add key recieve_dt(recipientaddress_id, mail_timestamp);
After that, MySQL will be able to fully utilize index for each of the 3 queries in union:
mysql> explain (select * from internalmail where senderaddress_id = 247 and mail_timestamp > '2010-08-19 12:30:47') union (select * from internalmail where recipientaddress_id = 247 and mail_timestamp > '2010-08-19 12:30:47') union (select * from internalmail where recipientaddress_id = 0 and mail_timestamp > '2010-08-19 12:30:47') order by mail_timestamp desc\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: internalmail type: range possible_keys: senderaddress_id,mail_timestamp,send_dt key: send_dt key_len: 9 ref: NULL rows: 5 Extra: Using where *************************** 2. row *************************** id: 2 select_type: UNION table: internalmail type: range possible_keys: recipientaddress_id,mail_timestamp,recieve_dt key: recieve_dt key_len: 9 ref: NULL rows: 11 Extra: Using where *************************** 3. row *************************** id: 3 select_type: UNION table: internalmail type: range possible_keys: recipientaddress_id,mail_timestamp,recieve_dt key: recieve_dt key_len: 9 ref: NULL rows: 1 Extra: Using where *************************** 4. row *************************** id: NULL select_type: UNION RESULT table:type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: NULL Extra: Using filesort 4 rows in set (0.00 sec)
Although this query has to perform a final filesort it is much faster: now it runs in 0 sec compared to 3 seconds originally.
select * from internalmail
where (i_senderaddress = 247 or i_recipientaddress = 247 or i_recipientaddress = 0)
and dt_timestamp > ‘2010-08-01 12:30:47’
order by dt_timestamp desc
In this case, will these two combined index: (dt_timestamp, i_senderaddress) and (dt_timestamp, i_recipientaddress ) work?
As the query contains “OR” in the query those indexes will not be used. See update.
(select * from internalmail where senderaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47′)
union all
(select * from internalmail where recipientaddress_id IN (0, 247) and mail_timestamp > ‘2010-08-19 12:30:47′)