How to improve the efficiency of a conditional join

  • If this has been answered before, please point me in the right direction; I'm just not sure what terms I would use to search for this scenario.

    I have a table that holds freight rates. Pulling the correct rate depends on the value of up to 8 different optional fields. As an example, I'll just use 3 of those fields for readability. Is there a more efficient way to do the following?

    SELECT COALESCE(fr1.FreightRate, fr2.FreightRate, fr3.FreightRate, 0)

    FROM dbo.Sales s

    LEFT JOIN dbo.FreightRate fr1 ON s.SourceKey = fr1.SourceKey

    AND s.CompanyKey = fr1.CompanyKey

    AND s.PackingType = fr1.PackingType

    LEFT JOIN dbo.FreightRate fr2 ON s.SourceKey = fr2.SourceKey

    AND s.CompanyKey = fr2.CompanyKey

    LEFT JOIN dbo.FreightRate fr3 ON s.SourceKey = fr3.SourceKey

    I could create a table that blows out FreightRate into every possible combination of values, but that seems like a lot of overhead when there are 8 different variables. There is a hierarchy in place, so I don't have to check the FreightRate table for every possible combination of columns. For example, I don't have to check SourceKey and PackingType together without also including CompanyKey, and I don't have to check CompanyKey on its own.

  • It's really hard to provide suggestions without sample data and expected results based on that sample. (See the first link in my signature about how to provide that.)

    The main problem I see is that you are joining the same table multiple times. You might be able to get away with only doing it once with a DENSE_RANK, ROW_NUMBER, or CROSS APPLY/TOP(1).

    WITH CTE AS (

    SELECT *, DENSE_RANK() OVER(PARTITION BY s.sale_id ORDER BY

    CASE

    WHEN s.CompanyKey = fr1.CompanyKey AND s.PackingType = fr1.PackingType THEN 1

    WHEN s.CompanyKey = fr1.CompanyKey THEN 2

    ELSE 3

    END) AS dr

    FROM dbo.Sales s

    LEFT JOIN dbo.FreightRate fr ON s.SourceKey = fr.SourceKey

    )

    SELECT *

    FROM CTE

    WHERE dr = 1

    I used a DENSE_RANK here, because it wasn't clear how many rows might be matched on each of your criteria. DENSE_RANK will return all of the rows, whereas ROW_NUMBER will only return the first.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • The most important thing is to:

    Cluster the dbo.FreightRate table on ( SourceKey, CompanyKey, PackingType ).

    But you could adjust the query slightly also:

    SELECT COALESCE(fr1.FreightRate, fr2.FreightRate, fr3.FreightRate, 0)

    FROM dbo.Sales s

    LEFT JOIN dbo.FreightRate fr1 ON s.SourceKey = fr1.SourceKey

    AND s.CompanyKey = fr1.CompanyKey

    AND s.PackingType = fr1.PackingType

    LEFT JOIN dbo.FreightRate fr2 ON fr1.SourceKey IS NULL

    AND s.SourceKey = fr2.SourceKey

    AND s.CompanyKey = fr2.CompanyKey

    LEFT JOIN dbo.FreightRate fr3 ON fr1.SourceKey IS NULL

    AND fr2.SourceKey IS NULL

    AND s.SourceKey = fr3.SourceKey

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks for the ideas. The dense rank option worked really well. My original code took about 15 minutes to update the sales table that has about 400,000 rows, and the ranked CTE got that down to just under 3 minutes. For an incremental update of about 1500 rows, it only took 7 seconds. I added a second CTE to get the average when there were multiple hits at the same level, but otherwise basically took Drew's code. Thank you, sir.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply