Blog Post

Converting OR to Union

,

When I looked at indexing for queries containing predicates combined with OR, it became clear that the are some restrictive requirements for indexes for the optimiser to consider using the indexes for seek operations.

  • Each predicate (or set of predicates) combined with an OR must have a separate index
  • All of those indexes must be covering, or the row count of the concatenated result set low enough to make key lookups an option, as the optimiser does not apparent to consider the possibility of doing key lookups for a subset of the predicates before concatenating the result sets.

So what can be done if it’s not possible to meet those requirements?

The standard trick is to convert the query with ORs into multiple queries combined with UNION. The idea is that since OR predicates are evaluated separately and the result sets concatenated, we can do that manually by writing the queries separately and concatenating them using UNION or UNION ALL. (UNION ALL can only be safely used if the predicates are known to be mutually exclusive)

CREATE TABLE Persons (
PersonID INT IDENTITY PRIMARY KEY,
FirstName    VARCHAR(30),
Surname VARCHAR(30),
Country CHAR(3),
RegistrationDate DATE
)

CREATE INDEX idx_Persons_FirstName ON dbo.Persons (FirstName) INCLUDE (Surname)
CREATE INDEX idx_Persons_Surname ON dbo.Persons (Surname) INCLUDE (FirstName)
GO

-- Data population using SQLDataGenerator

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel' OR Surname = 'Barnes'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Daniel'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Barnes'

In this case, the OR can be replaced with a UNION and the results are the same. The Union form is slightly less efficient according to the execution plan’s costings (60% compared to the OR at 40%), and the two queries have the same general form, with two index seeks and some form of concatenation and remove duplicates.

OrResult1
OrExecPlan1

So in that case it worked fine, although the original form was a little more efficient

Some care does need to be taken however, as the query with OR and the query with UNION may not always be equivalent, and it has to do with the elimination of duplicate rows.

In an OR, if a row qualifies for both of the predicates, it’s only returned once. That should be obvious, it’s how things should work, we don’t want to see the row multiple times just because it qualifies for more than one of the OR predicates. If we change that to UNION ALL then the row will be returned twice, it appears in both queries that are concatenated, and UNION ALL means combine without eliminating duplicates.

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Herman' OR Surname = 'Anderson'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Herman'
UNION ALL
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Anderson'

OrResult2a

In that example, Herman Anderson appears once in the results of the OR query and twice in the results of the UNION ALL. That’s because it qualifies for both predicates. The OR eliminated the duplication, the UNION ALL does not.

So change that UNION ALL to UNION so that the elimination of duplicate rows is done, the row appears only once and life is good again. Or is it?

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Alfred' OR Surname = 'Hickman'

SELECT FirstName, Surname
FROM dbo.Persons
WHERE FirstName = 'Alfred'
UNION
SELECT FirstName, Surname
FROM dbo.Persons
WHERE Surname = 'Hickman'
ORDER BY FirstName, Surname

OrResult2b

This time, Alfred Hickman appears twice in the results from the OR, but only once in the output from the UNION

The difference comes in how the duplicates are eliminated. With an OR, SQL does the elimination of duplicates based on the key value regardless of what may be in the select list. With an UNION, SQL does the elimination of duplicates based on the select list, regardless of what the key value may be and in the above example there were two rows in the table with the value ‘Alfred Hickman’. So with UNION you can lose rows if they are duplicated in the table.

The solution’s fairly simple, if converting an OR into a UNION, ensure that the key column(s) are in the select list, then the duplicate elimination done by the UNION will only remove rows that were part of both result sets, instead of also removing ones that really do appear twice in the table.

So in conclusion, if you’re replacing a query using OR with a query using UNION, be careful with the finer details around duplicates. If you know the conditions are mutually exclusive, use UNION ALL. If you don’t, use UNION and ensure that the table’s key column(s) are present in the select list so that the UNION doesn’t remove rows that you don’t want it to remove.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating