Join two tables with complex derived keys that don't match

  • Guys,

    I need to join two tables together - Master and Detail

    Master Table:
    ID =7000776601*1*6994
    Code=P*W*S
    Instances=6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899

    Details Table
    ID = 7000776601*1*6994*P*W*S*15436
    ChargeEffectiveDate = 15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072

    So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
    There are multiple details records (there should be one for each value in Instances but I haven't validated this)
    There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)

    What is the best strategy for doing a 1:n match on these.  The two strategies I have thought of ARE:
    1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
    2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
    This uses more disk IO but is indexable

    Is there any opinion on which is the best strategy or can you propose another?
    FWIW I don't care about the instance, I just need to get all the Detail records for the Master.

  • This isn't what an RDBMS is designed for, so performance won't be great.  But you can try this (feel free to correct any syntax errors that may have slipped in):

    -- If elements in Instances are always five characters
    ON Details.ID = Master.ID + Master.Code
    AND Master.Instances LIKE '%' + RIGHT(Details.ID,5) + '%'

    -- If elements in Instances are of variable length
    ON Details.ID = Master.ID + Master.Code
    AND Master.Instances LIKE '%' + RIGHT(Details.ID,LEN(Details.ID)-LEN(Master.ID+Master.Code)) + '%'

    You might also experiment with PATINDEX instead of LIKE on the second line, to see which performs better.

    John

  • This isn't going to be a pretty join no. I don't see any reason to split into a Temporary table though, do the job in one go:

    USE Sandbox;
    GO
    --CREATE DDL
    CREATE TABLE MasterTable
      (ID varchar(30),
      Code varchar(10),
      Instances varchar(400));
    GO
    CREATE TABLE Detail
      (ID varchar(50),
      ChargeEffectiveDate varchar(100));
    GO
    --Sample DAta
    INSERT INTO MasterTable
    VALUES ('7000776601*1*6994','P*W*S','6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899');
    GO
    INSERT INTO Detail
    VALUES ('7000776601*1*6994*P*W*S*15436','15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072');
    GO
    --Check data
    SELECT *
    FROM MasterTable;
    SELECT *
    FROM Detail;
    GO
    --Solution
    SELECT *
    FROM MasterTable M
      CROSS APPLY dbo.DelimitedSplit8K(M.Instances,'ý') DS
      JOIN Detail D ON M.ID + '*' + M.Code + '*' + DS.Item = D.ID;
    GO
    --Clean up
    DROP TABLE MasterTable;
    DROP TABLE Detail;
    GO

    As you can see, this does the JOIN in one parse, using Jeff's DelmitedSplit8K to separate the different parts out. This isn't going to be particularly quick on large tables though.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • My thinking behind the Temp Table is that I can build a suitable index on it, otherwise we are likely to be doing table scans.

  • aaron.reese - Thursday, November 2, 2017 5:20 AM

    My thinking behind the Temp Table is that I can build a suitable index on it, otherwise we are likely to be doing table scans.

    But you'd need a table scan to get all the data into the temp table, wouldn't you, plus a scan of the temp table to build the index?

    John

  • aaron.reese - Thursday, November 2, 2017 5:20 AM

    My thinking behind the Temp Table is that I can build a suitable index on it, otherwise we are likely to be doing table scans.

    You'll be doing that anyway when you put the data in the temporary table.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • aaron.reese - Thursday, November 2, 2017 4:33 AM

    Guys,

    I need to join two tables together - Master and Detail

    Master Table:
    ID =7000776601*1*6994
    Code=P*W*S
    Instances=6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899

    Details Table
    ID = 7000776601*1*6994*P*W*S*15436
    ChargeEffectiveDate = 15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072

    So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
    There are multiple details records (there should be one for each value in Instances but I haven't validated this)
    There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)

    What is the best strategy for doing a 1:n match on these.  The two strategies I have thought of ARE:
    1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
    2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
    This uses more disk IO but is indexable

    Is there any opinion on which is the best strategy or can you propose another?
    FWIW I don't care about the instance, I just need to get all the Detail records for the Master.

    You missed the third and best long term option.  Normalize your tables.  You're running into issues, because your tables aren't normalized.  Those issues go away if you properly normalize your tables.  I understand that you may be working with a third party product and not able to modify the table structure.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, November 2, 2017 8:08 AM

    aaron.reese - Thursday, November 2, 2017 4:33 AM

    Guys,

    I need to join two tables together - Master and Detail

    Master Table:
    ID =7000776601*1*6994
    Code=P*W*S
    Instances=6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899

    Details Table
    ID = 7000776601*1*6994*P*W*S*15436
    ChargeEffectiveDate = 15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072

    So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
    There are multiple details records (there should be one for each value in Instances but I haven't validated this)
    There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)

    What is the best strategy for doing a 1:n match on these.  The two strategies I have thought of ARE:
    1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
    2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
    This uses more disk IO but is indexable

    Is there any opinion on which is the best strategy or can you propose another?
    FWIW I don't care about the instance, I just need to get all the Detail records for the Master.

    You missed the third and best long term option.  Normalize your tables.  You're running into issues, because your tables aren't normalized.  Those issues go away if you properly normalize your tables.  I understand that you may be working with a third party product and not able to modify the table structure.

    Drew

    Normalize my tables - Why didn't I think of that  😛

    Unfortunately, not my tables 🙁
    These are 3rd party application tables that are legacy from pre-RDBMS days.  They are slowly migrating them to 3NF but currently work of the basis of  If It Aint Broke...

  • It seems as if you'd be scanning both tables anyway, since you're reading all the master and details (?).

    Maybe just?:

    FROM Details d
    INNER JOIN Master m ON d.ID LIKE m.ID + '*' + m.Code + '%'

    Hopefully (presumably?) both tables are clustered on ID, so you might be able to force a MERGE join without a sort being required if SQL doesn't generate such a plan on its own (then again, maybe not, as SQL must be ultra-cautious about using a MERGE join w/o a sort for fear of bad results).

    Edit:
    The Details table should have had a composite key of ( master_id, details_id ).  That would make joining a breeze, but wouldn't give up any flexibility in the actual values in the keys.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • aaron.reese - Friday, November 3, 2017 6:41 AM

    drew.allen - Thursday, November 2, 2017 8:08 AM

    aaron.reese - Thursday, November 2, 2017 4:33 AM

    Guys,

    I need to join two tables together - Master and Detail

    Master Table:
    ID =7000776601*1*6994
    Code=P*W*S
    Instances=6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899

    Details Table
    ID = 7000776601*1*6994*P*W*S*15436
    ChargeEffectiveDate = 15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072

    So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
    There are multiple details records (there should be one for each value in Instances but I haven't validated this)
    There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)

    What is the best strategy for doing a 1:n match on these.  The two strategies I have thought of ARE:
    1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
    2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
    This uses more disk IO but is indexable

    Is there any opinion on which is the best strategy or can you propose another?
    FWIW I don't care about the instance, I just need to get all the Detail records for the Master.

    You missed the third and best long term option.  Normalize your tables.  You're running into issues, because your tables aren't normalized.  Those issues go away if you properly normalize your tables.  I understand that you may be working with a third party product and not able to modify the table structure.

    Drew

    Normalize my tables - Why didn't I think of that  😛

    Unfortunately, not my tables 🙁
    These are 3rd party application tables that are legacy from pre-RDBMS days.  They are slowly migrating them to 3NF but currently work of the basis of  If It Aint Broke...

    Ummm...  I'm pretty sure it actually IS broke.   That kind of database with any kind of data volume = recipe for disaster.  It wouldn't take much to totally ruin this database.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I would agree that it is broke, but its not as broke as some other parts of the system 🙂

    In reality most of the data is historical and therefore the poor design does not impact significantly on day-to-day performance. When they get round to migrating it, I hope they also partition the data.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply