How to improve the efficiency of a conditional join

Question

How to improve the efficiency of a conditional join

kyle.marvin

SSC Rookie

Points: 30
More actions
October 5, 2016 at 11:50 am

#330728

If this has been answered before, please point me in the right direction; I'm just not sure what terms I would use to search for this scenario.
I have a table that holds freight rates. Pulling the correct rate depends on the value of up to 8 different optional fields. As an example, I'll just use 3 of those fields for readability. Is there a more efficient way to do the following?
SELECT COALESCE(fr1.FreightRate, fr2.FreightRate, fr3.FreightRate, 0)
FROM dbo.Sales s
LEFT JOIN dbo.FreightRate fr1 ON s.SourceKey = fr1.SourceKey
AND s.CompanyKey = fr1.CompanyKey
AND s.PackingType = fr1.PackingType
LEFT JOIN dbo.FreightRate fr2 ON s.SourceKey = fr2.SourceKey
AND s.CompanyKey = fr2.CompanyKey
LEFT JOIN dbo.FreightRate fr3 ON s.SourceKey = fr3.SourceKey
I could create a table that blows out FreightRate into every possible combination of values, but that seems like a lot of overhead when there are 8 different variables. There is a hierarchy in place, so I don't have to check the FreightRate table for every possible combination of columns. For example, I don't have to check SourceKey and PackingType together without also including CompanyKey, and I don't have to check CompanyKey on its own.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

drew.allen SSC Guru Points: 77014 More actions · Answer 1

It's really hard to provide suggestions without sample data and expected results based on that sample. (See the first link in my signature about how to provide that.)

The main problem I see is that you are joining the same table multiple times. You might be able to get away with only doing it once with a DENSE_RANK, ROW_NUMBER, or CROSS APPLY/TOP(1).

WITH CTE AS (

SELECT *, DENSE_RANK() OVER(PARTITION BY s.sale_id ORDER BY

CASE

WHEN s.CompanyKey = fr1.CompanyKey AND s.PackingType = fr1.PackingType THEN 1

WHEN s.CompanyKey = fr1.CompanyKey THEN 2

ELSE 3

END) AS dr

FROM dbo.Sales s

LEFT JOIN dbo.FreightRate fr ON s.SourceKey = fr.SourceKey

)

SELECT *

FROM CTE

WHERE dr = 1

I used a DENSE_RANK here, because it wasn't clear how many rows might be matched on each of your criteria. DENSE_RANK will return all of the rows, whereas ROW_NUMBER will only return the first.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

ScottPletcher SSC Guru Points: 101204 More actions · Answer 2

The most important thing is to:

Cluster the dbo.FreightRate table on ( SourceKey, CompanyKey, PackingType ).

But you could adjust the query slightly also:

SELECT COALESCE(fr1.FreightRate, fr2.FreightRate, fr3.FreightRate, 0)

FROM dbo.Sales s

LEFT JOIN dbo.FreightRate fr1 ON s.SourceKey = fr1.SourceKey

AND s.CompanyKey = fr1.CompanyKey

AND s.PackingType = fr1.PackingType

LEFT JOIN dbo.FreightRate fr2 ON fr1.SourceKey IS NULL

AND s.SourceKey = fr2.SourceKey

AND s.CompanyKey = fr2.CompanyKey

LEFT JOIN dbo.FreightRate fr3 ON fr1.SourceKey IS NULL

AND fr2.SourceKey IS NULL

AND s.SourceKey = fr3.SourceKey

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

kyle.marvin SSC Rookie Points: 30 More actions · Answer 3

Thanks for the ideas. The dense rank option worked really well. My original code took about 15 minutes to update the sales table that has about 400,000 rows, and the ranked CTE got that down to just under 3 minutes. For an incremental update of about 1500 rows, it only took 7 seconds. I added a second CTE to get the average when there were multiple hits at the same level, but otherwise basically took Drew's code. Thank you, sir.