RE: Performance improvement on statement.

One Orange Chip

Points: 27976

October 21, 2016 at 2:48 pm

#1908118

Another way to try is to unpack the ranges into their constituent values and then do a join based on equality to those values.

Sometimes that's slower, but at least with Chris' sample data setup, it is a bit faster on my machine than doing the join with order enforced, a bit lighter on CPU, and a lot lighter on reads.

query runs duration cpu reads

---------------------- ---- --------- -------- -------

Test: UNPACKED QUERY 10 8299557 8290538 816890

Test: RANGE JOIN QUERY 10 10561492 10555492 4371355

You pay some penalty for doing the unpacking and sorting the results, but at least on my machine with Chris' data that's outweighed by getting to use a merge join.

Here's the code I ran after using Chris' sample setup:

GO --End Chris' setup batch

EXEC(

'--Test: UNPACKED QUERY

DECLARE @bigint_bucket BIGINT,

@varchar_bucket VARCHAR(MAX);

SELECT @bigint_bucket=Low+RN, @bigint_bucket=Low, @bigint_bucket=High,@bigint_bucket=LINENUMBER,@varchar_bucket=other_fields

FROM #Ranges R

CROSS APPLY (SELECT TOP ((High-Low)+1) RN=ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1

FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n1(n)

CROSS JOIN

(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n2(n)

CROSS JOIN

(VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))n3(n)

) tally

INNER JOIN

#Lines_100 L ON L.LINENUMBER=Low+RN;

');

EXEC(

'--Test: RANGE JOIN QUERY

DECLARE @bigint_bucket BIGINT,

@varchar_bucket VARCHAR(MAX);

SELECT @bigint_bucket=LINENUMBER,

@bigint_bucket=low,

@bigint_bucket=High,

@varchar_bucket=Other_fields

FROM #Ranges r

INNER LOOP JOIN

#Lines_100 l

ON l.linenumber >= Low

AND

l.linenumber <= High;

');

GO 10 --Run each query 10 times.

SELECT query=SUBSTRING(qt.text,3,CHARINDEX('QUERY',qt.text)+2),

runs=qs.execution_count,

duration=total_elapsed_time,

cpu=total_worker_time,

reads=total_logical_reads

FROM sys.dm_exec_query_stats qs

CROSS APPLY

sys.dm_exec_sql_text(qs.sql_handle) qt

WHERE qt.text LIKE '--Test%'

ORDER BY total_elapsed_time ASC

;

Cheers!

EDIT: Fixed some of the code formatting. Using a new SSMS and haven't switched the settings around tabs/spaces, so it didn't paste very well.

I should also note that as written the unpacked query will handle ranges of up to 1000. If ranges are larger, the tally section will have to be adjusted. Performance is fairly sensitive to that, which is likely one reason it does relatively well on Chris' sample data with its very small ranges.

I'll probably try in a bit with larger ranges to see where the break even point is. I imagine it'll be pretty low.

EDIT 2: As suspected, the cost of sorting gets prohibitive pretty quickly as you increase the range size. The break even point is at an average range size of 5 on my machine. With the average range size of 18 you mentioned, the only way to really make something like this feasible would be if the Ranges table is not updated frequently.

In that case, you could probably maintain the unpacked list either as an indexed view (although the unpacking would have to be done differently in that case) or as a separate table, and just join the Line table to that.

Even that might not be much faster, though.