November 2010
M T W T F S S
« Apr   May »
1234567
891011121314
15161718192021
22232425262728
2930  

Converting queries with OR to Union to ulitize indexes

Lets say we have a table storing mail messages and we need to show user’s mailbox: messages sent “from” and “to” the specified user.

Here is our table:

CREATE TABLE `internalmail` (
  `mail_id` int(10) NOT NULL AUTO_INCREMENT,
  `senderaddress_id` int(10) NOT NULL,
  `recipientaddress_id` int(10) NOT NULL,
  `mail_timestamp` timestamp NULL DEFAULT NULL,
... message body, etc ...
  PRIMARY KEY (`mail_id`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

And our query:

select * from internalmail
 where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0) 
and mail_timestamp > '2010-08-01 12:30:47' 
order by mail_timestamp desc 

In this query we show all messages from and to user_id = 247 plus all messages to system user (user_id=0). We need to show only messages for the last 3 months and show the most recent messages first.

To speed up the query we can try creating indexes:

KEY `recipientaddress_id` (`recipientaddress_id`),
KEY `senderaddress_id` (`senderaddress_id`),
KEY `mail_timestamp` (`mail_timestamp`),

However, as the query uses “OR”, MySQL will use a filesort.

mysql> explain select * from internalmail 
where (senderaddress_id = 247 or recipientaddress_id = 247 or recipientaddress_id = 0) 
and mail_timestamp > '2010-08-01 12:30:47' 
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: internalmail
         type: ALL
possible_keys: recipientaddress_id,senderaddress_id,mail_timestamp
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4843257
        Extra: Using where; Using filesort
1 row in set (0.00 sec)

UPDATE: even if we will create combined indexes on (recipientaddress_id,mail_timestamp) and/or (senderaddress_id,mail_timestamp) those indexes will not be used, as the query contains “OR” in the where clause.

And original query runs for 3 seconds. To fix this query we can do 2 things:

  1. Rewrite query with UNION instead of OR
  2. Create combined indexes

First, we rewrite query with UNION:

(select * from internalmail where senderaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47’)
union
(select * from internalmail where recipientaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47’)
union
(select * from internalmail where recipientaddress_id = 0 and mail_timestamp > ‘2010-08-19 12:30:47’)
order by mail_timestamp desc;

Second, we create 2 indexes:

mysql> alter table internalmail add key send_dt(senderaddress_id, mail_timestamp);
mysql> alter table internalmail add key recieve_dt(recipientaddress_id, mail_timestamp);

After that, MySQL will be able to fully utilize index for each of the 3 queries in union:

mysql> explain 
(select * from internalmail where senderaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')  
union 
(select * from internalmail where  recipientaddress_id = 247  and mail_timestamp > '2010-08-19 12:30:47')  
union 
(select * from internalmail where  recipientaddress_id = 0  and mail_timestamp > '2010-08-19 12:30:47')  
order by mail_timestamp desc\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: internalmail
         type: range
possible_keys: senderaddress_id,mail_timestamp,send_dt
          key: send_dt
      key_len: 9
          ref: NULL
         rows: 5
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 11
        Extra: Using where
*************************** 3. row ***************************
           id: 3
  select_type: UNION
        table: internalmail
         type: range
possible_keys: recipientaddress_id,mail_timestamp,recieve_dt
          key: recieve_dt
      key_len: 9
          ref: NULL
         rows: 1
        Extra: Using where
*************************** 4. row ***************************
           id: NULL
  select_type: UNION RESULT
        table: 
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: NULL
        Extra: Using filesort
4 rows in set (0.00 sec)

Although this query has to perform a final filesort it is much faster: now it runs in 0 sec compared to 3 seconds originally.

3 comments to Converting queries with OR to Union to ulitize indexes

  • huarong

    select * from internalmail
    where (i_senderaddress = 247 or i_recipientaddress = 247 or i_recipientaddress = 0)
    and dt_timestamp > ‘2010-08-01 12:30:47’
    order by dt_timestamp desc

    In this case, will these two combined index: (dt_timestamp, i_senderaddress) and (dt_timestamp, i_recipientaddress ) work?

  • akira

    (select * from internalmail where senderaddress_id = 247 and mail_timestamp > ‘2010-08-19 12:30:47′)
    union all
    (select * from internalmail where recipientaddress_id IN (0, 247) and mail_timestamp > ‘2010-08-19 12:30:47′)