Join two tables with complex derived keys that don't match

Question

Join two tables with complex derived keys that don't match

aaron.reese

SSChampion

Points: 13498
More actions
November 2, 2017 at 4:33 am

#329402

Guys,
I need to join two tables together - Master and Detail
Master Table:
ID =7000776601*1*6994 Code=P*W*S Instances=6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899
Details Table
ID = 7000776601*1*6994*P*W*S*15436 ChargeEffectiveDate = 15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072
So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
There are multiple details records (there should be one for each value in Instances but I haven't validated this)
There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)
What is the best strategy for doing a 1:n match on these. The two strategies I have thought of ARE:
1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
This uses more disk IO but is indexable
Is there any opinion on which is the best strategy or can you propose another?
FWIW I don't care about the instance, I just need to get all the Detail records for the Master.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

This isn't what an RDBMS is designed for, so performance won't be great. But you can try this (feel free to correct any syntax errors that may have slipped in):

-- If elements in Instances are always five characters ON Details.ID = Master.ID + Master.Code AND Master.Instances LIKE '%' + RIGHT(Details.ID,5) + '%'

-- If elements in Instances are of variable length ON Details.ID = Master.ID + Master.Code AND Master.Instances LIKE '%' + RIGHT(Details.ID,LEN(Details.ID)-LEN(Master.ID+Master.Code)) + '%'

You might also experiment with PATINDEX instead of LIKE on the second line, to see which performs better.

John

Thom A SSC Guru Points: 99126 More actions · Answer 2

This isn't going to be a pretty join no. I don't see any reason to split into a Temporary table though, do the job in one go:

USE Sandbox;
GO
--CREATE DDL
CREATE TABLE MasterTable
  (ID varchar(30),
  Code varchar(10),
  Instances varchar(400));
GO
CREATE TABLE Detail
  (ID varchar(50),
  ChargeEffectiveDate varchar(100));
GO
--Sample DAta
INSERT INTO MasterTable
VALUES ('7000776601*1*6994','P*W*S','6994ý12510ý12874ý13245ý13609ý13973ý14337ý14701ý15009ý15072ý15436ý15800ý16164ý16528ý16899');
GO
INSERT INTO Detail
VALUES ('7000776601*1*6994*P*W*S*15436','15436ý15436ý15436ý15436ý15436ý15436ý15436ý15072');
GO
--Check data
SELECT *
FROM MasterTable;
SELECT *
FROM Detail;
GO
--Solution
SELECT *
FROM MasterTable M
  CROSS APPLY dbo.DelimitedSplit8K(M.Instances,'ý') DS
   JOIN Detail D ON M.ID + '*' + M.Code + '*' + DS.Item = D.ID;
GO
--Clean up
DROP TABLE MasterTable;
DROP TABLE Detail;
GO

As you can see, this does the JOIN in one parse, using Jeff's DelmitedSplit8K to separate the different parts out. This isn't going to be particularly quick on large tables though.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

aaron.reese SSChampion Points: 13498 More actions · Answer 3

My thinking behind the Temp Table is that I can build a suitable index on it, otherwise we are likely to be doing table scans.

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 4

aaron.reese - Thursday, November 2, 2017 5:20 AM
My thinking behind the Temp Table is that I can build a suitable index on it, otherwise we are likely to be doing table scans.

But you'd need a table scan to get all the data into the temp table, wouldn't you, plus a scan of the temp table to build the index?

John

Thom A SSC Guru Points: 99126 More actions · Answer 5

aaron.reese - Thursday, November 2, 2017 5:20 AM
My thinking behind the Temp Table is that I can build a suitable index on it, otherwise we are likely to be doing table scans.

You'll be doing that anyway when you put the data in the temporary table.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

drew.allen SSC Guru Points: 76948 More actions · Answer 6

aaron.reese - Thursday, November 2, 2017 4:33 AM
Guys,
I need to join two tables together - Master and Detail
Master Table:
ID =7000776601*1*6994 Code=P*W*S Instances=6994Ã½12510Ã½12874Ã½13245Ã½13609Ã½13973Ã½14337Ã½14701Ã½15009Ã½15072Ã½15436Ã½15800Ã½16164Ã½16528Ã½16899
Details Table
ID = 7000776601*1*6994*P*W*S*15436 ChargeEffectiveDate = 15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15072
So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
There are multiple details records (there should be one for each value in Instances but I haven't validated this)
There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)
What is the best strategy for doing a 1:n match on these. The two strategies I have thought of ARE:
1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
This uses more disk IO but is indexable
Is there any opinion on which is the best strategy or can you propose another?
FWIW I don't care about the instance, I just need to get all the Detail records for the Master.

You missed the third and best long term option. Normalize your tables. You're running into issues, because your tables aren't normalized. Those issues go away if you properly normalize your tables. I understand that you may be working with a third party product and not able to modify the table structure.

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA

aaron.reese SSChampion Points: 13498 More actions · Answer 7

drew.allen - Thursday, November 2, 2017 8:08 AM
aaron.reese - Thursday, November 2, 2017 4:33 AM
Guys,
I need to join two tables together - Master and Detail
Master Table:
ID =7000776601*1*6994 Code=P*W*S Instances=6994Ã½12510Ã½12874Ã½13245Ã½13609Ã½13973Ã½14337Ã½14701Ã½15009Ã½15072Ã½15436Ã½15800Ã½16164Ã½16528Ã½16899
Details Table
ID = 7000776601*1*6994*P*W*S*15436 ChargeEffectiveDate = 15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15072
So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
There are multiple details records (there should be one for each value in Instances but I haven't validated this)
There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)
What is the best strategy for doing a 1:n match on these. The two strategies I have thought of ARE:
1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
This uses more disk IO but is indexable
Is there any opinion on which is the best strategy or can you propose another?
FWIW I don't care about the instance, I just need to get all the Detail records for the Master.
You missed the third and best long term option. Normalize your tables. You're running into issues, because your tables aren't normalized. Those issues go away if you properly normalize your tables. I understand that you may be working with a third party product and not able to modify the table structure.
Drew

Normalize my tables - Why didn't I think of that 😛

Unfortunately, not my tables 🙁
These are 3rd party application tables that are legacy from pre-RDBMS days. They are slowly migrating them to 3NF but currently work of the basis of If It Aint Broke...

ScottPletcher SSC Guru Points: 100950 More actions · Answer 8

It seems as if you'd be scanning both tables anyway, since you're reading all the master and details (?).

Maybe just?:

FROM Details d
INNER JOIN Master m ON d.ID LIKE m.ID + '*' + m.Code + '%'

Hopefully (presumably?) both tables are clustered on ID, so you might be able to force a MERGE join without a sort being required if SQL doesn't generate such a plan on its own (then again, maybe not, as SQL must be ultra-cautious about using a MERGE join w/o a sort for fear of bad results).

Edit:
The Details table should have had a composite key of ( master_id, details_id ). That would make joining a breeze, but wouldn't give up any flexibility in the actual values in the keys.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

sgmunson SSC Guru Points: 110639 More actions · Answer 9

aaron.reese - Friday, November 3, 2017 6:41 AM
drew.allen - Thursday, November 2, 2017 8:08 AM
aaron.reese - Thursday, November 2, 2017 4:33 AM
Guys,
I need to join two tables together - Master and Detail
Master Table:
ID =7000776601*1*6994 Code=P*W*S Instances=6994Ã½12510Ã½12874Ã½13245Ã½13609Ã½13973Ã½14337Ã½14701Ã½15009Ã½15072Ã½15436Ã½15800Ã½16164Ã½16528Ã½16899
Details Table
ID = 7000776601*1*6994*P*W*S*15436 ChargeEffectiveDate = 15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15436Ã½15072
So the Details table key is the Master.ID + Master.Code + one of the values in Master.Instances
There are multiple details records (there should be one for each value in Instances but I haven't validated this)
There will always be the same number of elements in the compound key string, but the length of these items can change so I can't just do LEFT(23)
What is the best strategy for doing a 1:n match on these. The two strategies I have thought of ARE:
1) REVERSE() the details key then find the position of the 1st *, take the RIGHT() from the star and then REVERSE() back again but this will be non-SARGable and they are fairly big tables so a table scan is a big performance hit
2) run the Master.Instances through Jeff Moden's DelimitedSplit8K into a temp table, Join the Master to the TT on a 1:n JOIN and then join the TT to the Details on a 1:1 JOIN.
This uses more disk IO but is indexable
Is there any opinion on which is the best strategy or can you propose another?
FWIW I don't care about the instance, I just need to get all the Detail records for the Master.
You missed the third and best long term option. Normalize your tables. You're running into issues, because your tables aren't normalized. Those issues go away if you properly normalize your tables. I understand that you may be working with a third party product and not able to modify the table structure.
Drew
Normalize my tables - Why didn't I think of that 😛
Unfortunately, not my tables 🙁
These are 3rd party application tables that are legacy from pre-RDBMS days. They are slowly migrating them to 3NF but currently work of the basis of If It Aint Broke...

Ummm... I'm pretty sure it actually IS broke. That kind of database with any kind of data volume = recipe for disaster. It wouldn't take much to totally ruin this database.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

aaron.reese SSChampion Points: 13498 More actions · Answer 10

I would agree that it is broke, but its not as broke as some other parts of the system 🙂

In reality most of the data is historical and therefore the poor design does not impact significantly on day-to-day performance. When they get round to migrating it, I hope they also partition the data.