Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Join operation over String non-index fields Expand / Collapse
Author
Message
Posted Monday, July 30, 2012 2:04 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 5:14 AM
Points: 119, Visits: 486
Hi all,

I have to perform some joins between a Fact table (with millions of records) and some contextual tables. My problem is that some tables have an auto incremental field as the PK, but this key is not useful for join operations. The business key is stored in a string field; moreover, this field is not defined as unique and is a String field. When the value of the field is unknown you get a "U". In the following image an example of this situation can be seen:



My first approach was to create new tables with the business key as the index and an ETL to load this table, but the managers of the project have discard this option and I need a clever solution to overcome this drawback.

Any comments or suggestions will be appreciated.

Kind Regards,

Paul


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1337096
Posted Monday, July 30, 2012 2:44 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:52 AM
Points: 6,826, Visits: 14,062
Hi Paul

Apart from the fact that a simple join between the two tables on konsumfeld
SELECT ...
FROM Tableb b
LEFT JOIN Tablea a ON a.konsumfeld = b.konsumfeld

would eliminate rows from tableb where konsumfeld = 'U', I can't see what the problem is. Why would you want to make another copy of the table in order to index konsumfeld?


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1337107
Posted Monday, July 30, 2012 3:36 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 5:14 AM
Points: 119, Visits: 486
Hi Chris,

Thanks so much for your quick answer.

My problem is I have to perform 3 Joins with 2 tables with the same situation and 1 table with a numeric key. I don't have control over the source tables, but the business keys always hold numeric values and they are defined as Strings. For a fact table with 482.866 records the query takes about 6 minutes to execute. I don't know how powerful the server is but is this a reasonable amount of time? This fact table contains only delta data (records for 1 day) but imagine the historical table that has more than 165 million of rows.


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1337132
Posted Monday, July 30, 2012 4:04 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:52 AM
Points: 6,826, Visits: 14,062

You're welcome Paul.

paul.pauldj54 (7/30/2012)
...the business keys always hold numeric values and they are defined as Strings...


A column containing values with leading zero's can only be a string.

...For a fact table with 482.866 records the query takes about 6 minutes to execute. I don't know how powerful the server is but is this a reasonable amount of time? ...

I'd say this is slow. Can you post the actual execution plan of the query? Save it as a .sqlplan file and post it here as an attachment.

Is the business key column the same datatype in all of the tables?






“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1337142
Posted Monday, July 30, 2012 5:39 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 5:14 AM
Points: 119, Visits: 486
Hi Chris,

Attached you can find the execution plan (I've never used this feature, it seems to be very cool).

If you look at the first figure (from my 1st post) there is a column called "KUNDE_KEY" which is a business key, and a numerical field.

You are absolutely right, leading-zero values must be strings, I haven't notice

Now I'll dig deep into execution plans

Thanks for your time,

Paul


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng


  Post Attachments 
AggrData_Exc_Plan.sqlplan (6 views, 111.24 KB)
Post #1337185
Posted Monday, July 30, 2012 5:44 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:52 AM
Points: 6,826, Visits: 14,062
Hi Paul, can you post the Actual plan please, rather than the estimated plan? There are often differences between the two - for instance, the estimated plan you've just posted indicates that your output from this query is 56 million rows.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1337191
Posted Monday, July 30, 2012 7:45 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 5:14 AM
Points: 119, Visits: 486
Hi Chris,

Attached the Actual Excecution Plan,

For other readers, I found this link very useful to understan execution plans from SQL SERVER: http://www.simple-talk.com/sql/performance/execution-plan-basics/

Kind Regards,

Paul


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng


  Post Attachments 
AggrData_Exc_Plan.sqlplan (6 views, 124.66 KB)
Post #1337241
Posted Monday, July 30, 2012 8:33 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:52 AM
Points: 6,826, Visits: 14,062
Hi Paul, thanks for the plan.

Points:
1. No clustered indexes on any of the 4 tables, a suitable clustered index on DeltaFAKT_DETAIL would probably help the expensive SORT in the plan. I'd start with the following columns in that clustered index, in the order shown:
FD.FILIALE_KEY, -- join column
FD.BRANCHE, -- join column
FD.KONSUMFELD, -- join column
FD.KUNDE_KEY,
FD.EK_DATUM,
FD.ABT,
FD.UN_3
I'd also change the GROUP BY to match this order:
GROUP BY
FD.FILIALE_KEY, --
FD.BRANCHE, --
FD.KONSUMFELD, --
FD.KUNDE_KEY,
FD.EK_DATUM,
FD.ABT,
FD.UN_3,
FI.FILIALE,
FI.FILIALE_KTEXT,
FI.FILIALE_VERKAUFSREGION,
FI.FILIALE_SAP_KEY,
BR.BRANCHE,
BR.BRANCHE_BEZ,
BR.BRANCHE_KBEZ,
KON.KONSUMFELD,
KON.KONSUMFELD_BEZ,
KON.KONSUMFELD_KBEZ;

Indexes on the other tables would promote more efficient joins, but their rowcounts are so low that the improvement might not be noticeable.
2. There's a 1:1 join between FILIALE and DeltaFAKT_DETAIL; however the 482,866 row product of this join, once joined with BRANCHE, has 10x the number of rows. Is this correct? Is there a join predicate missing here?
3. Likewise with the join to table KONSUMFELD; the rowcount goes up by a factor of 10.
4. The final output set of 537,859 rows is sufficiently similar to the rowcount of DeltaFAKT_DETAIL to raise an eyebrow.

I'd begin by checking that the join predicates are correct.


“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1337270
Posted Monday, July 30, 2012 8:46 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, October 1, 2014 5:14 AM
Points: 119, Visits: 486
Chris, you're simply amazing!!!

I've learned a lot from you today.

I'll study carefully your last reply, implement the changes and then write back to the forum to share my results.

Kind Regards,

Paul


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1337288
Posted Monday, July 30, 2012 9:03 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 10:52 AM
Points: 6,826, Visits: 14,062
That's very kind Paul but I'm just a jobbing TSQL developer! Thanks anyway.

“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1337313
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse