SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Join operation over String non-index fields


Join operation over String non-index fields

Author
Message
Paul Hernández
Paul Hernández
Old Hand
Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)

Group: General Forum Members
Points: 376 Visits: 661
Hi all,

I have to perform some joins between a Fact table (with millions of records) and some contextual tables. My problem is that some tables have an auto incremental field as the PK, but this key is not useful for join operations. The business key is stored in a string field; moreover, this field is not defined as unique and is a String field. When the value of the field is unknown you get a "U". In the following image an example of this situation can be seen:



My first approach was to create new tables with the business key as the index and an ETL to load this table, but the managers of the project have discard this option and I need a clever solution to overcome this drawback.

Any comments or suggestions will be appreciated.

Kind Regards,

Paul

Paul Hernández
ChrisM@Work
ChrisM@Work
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16119 Visits: 19534
Hi Paul

Apart from the fact that a simple join between the two tables on konsumfeld
SELECT ...
FROM Tableb b
LEFT JOIN Tablea a ON a.konsumfeld = b.konsumfeld


would eliminate rows from tableb where konsumfeld = 'U', I can't see what the problem is. Why would you want to make another copy of the table in order to index konsumfeld?

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Paul Hernández
Paul Hernández
Old Hand
Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)

Group: General Forum Members
Points: 376 Visits: 661
Hi Chris,

Thanks so much for your quick answer.

My problem is I have to perform 3 Joins with 2 tables with the same situation and 1 table with a numeric key. I don't have control over the source tables, but the business keys always hold numeric values and they are defined as Strings. For a fact table with 482.866 records the query takes about 6 minutes to execute. I don't know how powerful the server is but is this a reasonable amount of time? This fact table contains only delta data (records for 1 day) but imagine the historical table that has more than 165 million of rows.

Paul Hernández
ChrisM@Work
ChrisM@Work
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16119 Visits: 19534
You're welcome Paul.

paul.pauldj54 (7/30/2012)
...the business keys always hold numeric values and they are defined as Strings...


A column containing values with leading zero's can only be a string.

...For a fact table with 482.866 records the query takes about 6 minutes to execute. I don't know how powerful the server is but is this a reasonable amount of time? ...

I'd say this is slow. Can you post the actual execution plan of the query? Save it as a .sqlplan file and post it here as an attachment.

Is the business key column the same datatype in all of the tables?

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Paul Hernández
Paul Hernández
Old Hand
Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)

Group: General Forum Members
Points: 376 Visits: 661
Hi Chris,

Attached you can find the execution plan (I've never used this feature, it seems to be very cool).

If you look at the first figure (from my 1st post) there is a column called "KUNDE_KEY" which is a business key, and a numerical field.

You are absolutely right, leading-zero values must be strings, I haven't notice :-D

Now I'll dig deep into execution plans Alien

Thanks for your time,

Paul

Paul Hernández
Attachments
AggrData_Exc_Plan.sqlplan (9 views, 111.00 KB)
ChrisM@Work
ChrisM@Work
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16119 Visits: 19534
Hi Paul, can you post the Actual plan please, rather than the estimated plan? There are often differences between the two - for instance, the estimated plan you've just posted indicates that your output from this query is 56 million rows.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Paul Hernández
Paul Hernández
Old Hand
Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)

Group: General Forum Members
Points: 376 Visits: 661
Hi Chris,

Attached the Actual Excecution Plan,

For other readers, I found this link very useful to understan execution plans from SQL SERVER: http://www.simple-talk.com/sql/performance/execution-plan-basics/

Kind Regards,

Paul

Paul Hernández
Attachments
AggrData_Exc_Plan.sqlplan (8 views, 124.00 KB)
ChrisM@Work
ChrisM@Work
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16119 Visits: 19534
Hi Paul, thanks for the plan.

Points:
1. No clustered indexes on any of the 4 tables, a suitable clustered index on DeltaFAKT_DETAIL would probably help the expensive SORT in the plan. I'd start with the following columns in that clustered index, in the order shown:
FD.FILIALE_KEY, -- join column
FD.BRANCHE, -- join column
FD.KONSUMFELD, -- join column
FD.KUNDE_KEY,
FD.EK_DATUM,
FD.ABT,
FD.UN_3
I'd also change the GROUP BY to match this order:
GROUP BY
FD.FILIALE_KEY, --
FD.BRANCHE, --
FD.KONSUMFELD, --
FD.KUNDE_KEY,
FD.EK_DATUM,
FD.ABT,
FD.UN_3,
FI.FILIALE,
FI.FILIALE_KTEXT,
FI.FILIALE_VERKAUFSREGION,
FI.FILIALE_SAP_KEY,
BR.BRANCHE,
BR.BRANCHE_BEZ,
BR.BRANCHE_KBEZ,
KON.KONSUMFELD,
KON.KONSUMFELD_BEZ,
KON.KONSUMFELD_KBEZ;

Indexes on the other tables would promote more efficient joins, but their rowcounts are so low that the improvement might not be noticeable.
2. There's a 1:1 join between FILIALE and DeltaFAKT_DETAIL; however the 482,866 row product of this join, once joined with BRANCHE, has 10x the number of rows. Is this correct? Is there a join predicate missing here?
3. Likewise with the join to table KONSUMFELD; the rowcount goes up by a factor of 10.
4. The final output set of 537,859 rows is sufficiently similar to the rowcount of DeltaFAKT_DETAIL to raise an eyebrow.

I'd begin by checking that the join predicates are correct.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Paul Hernández
Paul Hernández
Old Hand
Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)Old Hand (376 reputation)

Group: General Forum Members
Points: 376 Visits: 661
Chris, you're simply amazing!!!

I've learned a lot from you today.

I'll study carefully your last reply, implement the changes and then write back to the forum to share my results.

Kind Regards,

Paul

Paul Hernández
ChrisM@Work
ChrisM@Work
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16119 Visits: 19534
That's very kind Paul but I'm just a jobbing TSQL developer! Thanks anyway.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search