Triangle Join - Can I do it right or a replacement?

  • Hi.

    I'm trying to make a summarization that will be cumulative but will use a tally table called MonthTable to show the twelve months of the year. I've read the countless articles (including the wonderful ones by Jeff Moden) about triangle joins and how they can be a bottleneck in a server's memory and processor.

    I'm trying to do the following:

    WITH MonthTable AS

    (

    SELECT '01' AS Mes UNION ALL

    SELECT '02' AS Mes UNION ALL

    SELECT '03' AS Mes UNION ALL

    SELECT '04' AS Mes UNION ALL

    SELECT '05' AS Mes UNION ALL

    SELECT '06' AS Mes UNION ALL

    SELECT '07' AS Mes UNION ALL

    SELECT '08' AS Mes UNION ALL

    SELECT '09' AS Mes UNION ALL

    SELECT '10' AS Mes UNION ALL

    SELECT '11' AS Mes UNION ALL

    SELECT '12' AS Mes

    ),

    ExampleTable AS

    (

    SELECT CAST('2011-01-15' AS DATE) AS DATA, 150 AS TOTAL UNION ALL

    SELECT CAST('2011-04-20' AS DATE) AS DATA, 80 AS TOTAL UNION ALL

    SELECT CAST('2011-05-03' AS DATE) AS DATA, 100 AS TOTAL

    )

    SELECT EXE.ANO,

    MES.MES,

    SUM(EXE.TOTAL) AS TOTAL

    FROM MonthTable MES

    LEFT JOIN (SELECT YEAR(DATA) AS ANO,

    MONTH(DATA) AS MES,

    TOTAL

    FROM ExampleTable

    WHERE YEAR(DATA) = '2011') EXE

    ON MES.MES >= EXE.MES

    GROUP BY EXE.ANO, MES.MES

    ORDER BY MES.MES;

    It's just an example and, while the MonthTable will always be like that, the ExampleTable may have thousands or millions of rows.

    I ask you if it's such a problem since I'm not joining two large table but one large table with a month tally table that will always have 12 rows.

    If it's a not recommended way of doing this, do you have any suggestions of a replacement code that's better?

    I thank you for any help in this since it's part of a task I need to do for my job.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • At least, in the derived table I'm limiting the year range to a specific one (2011, in the example).

    Of course I can always replace the YEAR and MONTH functions to date ranges (BETWEEN '20110101' AND '20111231') so the optimizer will use the appropriate indexes but I haven't tested the code with a lot of data.

    After reading Jeff Moden's articles and trying to see the big picture in the processing I've got scared of trying this out. 🙂

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Why not put your derived table into a CTE as well? Or a temp table? It doesn't address the triangluar join yet, but that's the first thing I noticed. I hate derived tables...

    Jared
    CE - Microsoft

  • I dont see a triangular join here. Its pretty straight forward join. Or maybe im overlooking something here :blink:

  • ColdCoffee (3/20/2012)


    I dont see a triangular join here. Its pretty straight forward join. Or maybe im overlooking something here :blink:

    Triangular join because the join condition is >=

    Jared
    CE - Microsoft

  • SQLKnowItAll (3/20/2012)


    Why not put your derived table into a CTE as well? Or a temp table? It doesn't address the triangluar join yet, but that's the first thing I noticed. I hate derived tables...

    I will probably use an indexed temp table for the large amount of data in the derived table.

    I prefer using CTEs and temp tables as well because the code gets easier to read.

    The only thing I must take care when creating a temp table is that after populating it, an index will be needed for retrieving a lot of data efficiently, which doesn't happen when you access the table directly in the derived table and the table has a proper index.

    Even so I'm scared that it may blow up the server somehow. 🙂

    Or is it not so much of a problem the way it's written?

    Thank you for the advices.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • SQLKnowItAll (3/20/2012)


    ColdCoffee (3/20/2012)


    I dont see a triangular join here. Its pretty straight forward join. Or maybe im overlooking something here :blink:

    Triangular join because the join condition is >=

    Also, look at the potential performance problem in the original code. Look for the arrow with a rowcount of 29. Now, add just 3 more entries to the "ExampleTable" CTE and look at the same arrow. It'll be 47. It's not a full Triangular Join but it's definitely a many-to-many that's just not going to scale well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/20/2012)


    Also, look at the potential performance problem in the original code. Look for the arrow with a rowcount of 29. Now, add just 3 more entries to the "ExampleTable" CTE and look at the same arrow. It'll be 47. It's not a full Triangular Join but it's definitely a many-to-many that's just not going to scale well.

    You are definitely right, Jeff!

    Unfortunately that's the only technique I know instead of using the new SQL Server 2012 features dealing with window functions.

    Is there any way for me to achieve the same result?

    Thank you for your input.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • codebyo (3/20/2012)


    SQLKnowItAll (3/20/2012)


    Why not put your derived table into a CTE as well? Or a temp table? It doesn't address the triangluar join yet, but that's the first thing I noticed. I hate derived tables...

    I will probably use an indexed temp table for the large amount of data in the derived table.

    I prefer using CTEs and temp tables as well because the code gets easier to read.

    The only thing I must take care when creating a temp table is that after populating it, an index will be needed for retrieving a lot of data efficiently, which doesn't happen when you access the table directly in the derived table and the table has a proper index.

    Even so I'm scared that it may blow up the server somehow. 🙂

    Or is it not so much of a problem the way it's written?

    Thank you for the advices.

    Andre,

    Let's take all the fear out of this. Does the "ExampleTable" CTE accurately represent the columns we need to work with from a real table? Can there be more than one of the same date, for example?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/20/2012)


    Andre,

    Let's take all the fear out of this. Does the "ExampleTable" CTE accurately represent the columns we need to work with from a real table? Can there be more than one of the same date, for example?

    Actually the ExampleTable is very close to the final table because it's meant to be a "result table" with only a few calculations for BI reports. There will be only one year at a time involved in the ExampleTable query because we need all the months of a given year. And there will be only one row per date because we only need the total value of a given day.

    The problem for me resides in the date gaps as shown in the three rows of the ExampleTable. 🙂

    Thank you for your answer.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • This is basically a running totals query, and I believe that's where Jeff's going with his solution so I won't work the same process twice. 😉

    However, there's another concern with your query, and that's the non-SARGability of the year.

    WHERE YEAR(DATA) = '2011'

    will not use indexes. However:

    WHERE Data >= '20110101' AND Data < '20120101'

    will use an index on Data properly. Notice how the calculation is not being done on the column being reviewed.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • codebyo (3/20/2012)


    Jeff Moden (3/20/2012)


    Actually the ExampleTable is very close to the final table because it's meant to be a "result table" with only a few calculations for BI reports. There will be only one year at a time involved in the ExampleTable query because we need all the months of a given year. And there will be only one row per date because we only need the total value of a given day.

    The problem for me resides in the date gaps as shown in the three rows of the ExampleTable. 🙂

    Thank you for your answer.

    Please understand my confusion on all of that. Previously you stated...

    It's just an example and, while the MonthTable will always be like that, the ExampleTable may have thousands or millions of rows.

    Are you now saying that the "ExampleTable" will only have 12 (or fewer) rows in it? 1 for each month???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Evil Kraig F (3/20/2012)


    This is basically a running totals query, and I believe that's where Jeff's going with his solution so I won't work the same process twice. 😉

    However, there's another concern with your query, and that's the non-SARGability of the year.

    WHERE YEAR(DATA) = '2011'

    will not use indexes. However:

    WHERE Data >= '20110101' AND Data < '20120101'

    will use an index on Data properly. Notice how the calculation is not being done on the column being reviewed.

    Yes, you are correct!

    I mentioned that issue in the second post of this topic and I will change the query.

    Thanks a lot for the analysis because it confirmed what I was thinking and allowed me to know that I'm in the right path. 🙂

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Jeff Moden (3/20/2012)


    Please understand my confusion on all of that. Previously you stated...

    It's just an example and, while the MonthTable will always be like that, the ExampleTable may have thousands or millions of rows.

    Are you now saying that the "ExampleTable" will only have 12 (or fewer) rows in it? 1 for each month???

    Sorry for the confusion. 🙂

    The ExampleTable may have one row per day of the year. It's just an abstract example. I'm not sure how the final table will look like because I'm depending on the ERD guys to show us the final models. When I said thousands or millions I was referring to a different case that would benefit from the same answer maybe. I shouldn't have mentioned it.

    There will be a lot of tables with precalculated values.

    Example: Table ProductSummary

    Columns: Date, Product, Vendor, Total

    Rule: One row per day at maximum.

    Example: Table CustomerPurchase

    Columns: Date, Customer, Group, Product, Total

    Rule: One row per day at maximum.

    There will always be one row per day at maximum but there can be gaps in dates.

    It's the final query that will show only twelve rows with the result (cumulative count or sum) from the many rows from tables like ProductSummary or CustomerPurchase.

    I'm not at work right now but I can code a better example tomorrow if needed.

    Thank you for your reply and sorry again for the confusion.

    Best regards,

    Andre Guerreiro Neto

    Database Analyst
    http://www.softplan.com.br
    MCITPx1/MCTSx2/MCSE/MCSA

  • Example: Table ProductSummary

    Columns: Date, Product, Vendor, Total

    Rule: One row per day at maximum.

    So, one row per date per product per vendor, correct?

    And you want running totals by day and not month?

    Sorry for all the questions but I like to get these things right before I start writing code/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply