Recently I had to create a simple query to lookup a single row in a table containing ranges that do not overlap. Consider this table design and example data:
CREATE TABLE MyRanges ( id INT IDENTITY(1,1) PRIMARY KEY, RangeFrom INT NOT NULL, RangeTo INT NOT NULL, Value VARCHAR(20) NOT NULL ) GO
INSERT INTO MyRanges (RangeFrom, RangeTo, Value) VALUES (1, 5, 'The first range'), (6, 7, 'The second range'), (8, 11, 'The third range'), (12, 25, 'The fourth range'), (26, 30, 'The fifth range')
The query I had to create was simply to return the Value column where a given input value were within that range. So given the value 10, I would expect to get the value “The third range” back. My first attempt looked like this:
DECLARE @input INT = 10 SELECT Value FROM MyRanges WHERE @input BETWEEN RangeFrom AND RangeTo
This gave me exactly what I needed, and was as simple as could be. But I also needed to consider the performance, since the query would be executed a lot. The query above simply performed a clustered index scan, which was expected. I then added a covering index on RangeFrom, RangeTo that included the Value. To my surprise it could only use the index for a scan, and not a lookup. That made the performance vary on different input values.
The problem is that I’m actually performing 2 range scans, one on each Range columns, which is more obvious if we rewrite the where clause to this:
WHERE RangeFrom <= @input
AND RangeTo >= @input
I also tried to create an index on each column, but that did not help me much either.
My big problem was, that the performance varied on different input values. If I executed the query with a very low input value, it performed reasonable, but when I used a high input value, it almost scanned the entire table. My goal was to have consistent performance and only an index lookup instead of an index scan with hundreds of logical io’s.
The solution to my specific problem was to rethink the way I queried my data. In my example I know that the ranges do not overlap, and that there are no holes in the ranges. So every possible query will return one and only one row. But SQL Server has no clue about these things because I haven’t told it.
Instead I can rewrite my query using this knowledge, and I came up with this instead:
DECLARE @input INT = 10 SELECT TOP 1 Value FROM MyRanges WHERE RangeFrom <= @input ORDER BY RangeFrom DESC
I dropped my previous index, and went with this one instead:
CREATE NONCLUSTERED INDEX idx1 ON MyRanges (RangeFrom) INCLUDE (value)
(Don’t mind the naming convention…)
Now my query simply performs exactly one index lookup with a consistent 2 logical reads. It doesn’t get much better than that
In my case the missing piece was the fact that the ranges did not overlap and there were no holes in the entire range from min to max. Other cases could be event tables, where you log when an event has happened, and when the issue have been resolved. The input value could then be a point in time, and you want to know how many open events existed at that specific time.
The obvius way would be similar to what I first tried above:
SELECT * FROM MyEventTable WHERE @input BETWEEN EventStart AND EventEnd
This has the exact same problem as my example, but the solution is different. In this case my rules above dont apply, because multiple events could be active at the same time, and other times no events are active. But perhaps you know that the time difference between the two date columns is no more than 10 minutes.
The goal is to convert the query from a 2D query to a 1 dimensional query, so let’s change the predicate to this:
SELECT * FROM MyEventTable WHERE EventCreated BETWEEN DATEADD(mi, -10, @Input) AND @Input
This is a one dimensional query that can utilize an index on the Eventcreated column to narrow down the rows. But this will also return events that are no longer active. But we can simply add both predicates to help the optimizer to narrow down rows efficiently as well as returning the correct data:
SELECT * FROM MyEventTable WHERE EventCreated BETWEEN DATEADD(mi, -10, @Input) AND @Input AND @input BETWEEN EventStart AND EventEnd
Im sure there are plenty of cases where you can benefit a lot by rethinking the way you approach your data. Converting 2D queries to 1D queries is certainly one that boost performance a lot.