Combine Multiple Queries into a CTE?

  • Hello all,

    I am looking to build a query with a result set that combines multiple queries to appear as one result set where I can then do a calculation in a separate column in that result set. I believe I will need to use a CTE but am not sure how I can build it with multiple queries.

    Below is some sample data which has been refined for the concept I'm trying to achieve.

    CREATE Table [dbo].[tempCustomerLedgerEntry]

    (

    [PostingDate] [DateTime] NOT NULL,

    [SalesLCY] [Decimal] (38, 20) NOT NULL,

    [SalespersonCode] [varchar] (20) NOT NULL,

    [OrderNum][int]NOT NULL

    )

    Insert Into [dbo].[tempCustomerLedgerEntry]

    (

    [PostingDate],[SalesLCY],[SalespersonCode],[OrderNum]

    )

    VALUES

    ( '2012-01-09', '1000.09', 'KC', '1234')

    ,( '2012-01-19', '1109.05', 'KC', '1235')

    ,( '2012-02-14', '5500.98', 'KC', '1237')

    ,( '2012-03-18', '10542.88', 'KC', '1238')

    ,( '2013-01-19', '99000.86', 'KC', '1239')

    ,( '2013-02-15', '105000.89','KC', '1240')

    Select * from [dbo].[tempCustomerLedgerEntry]

    CREATE TABLE [dbo].[tempCreditMemo]

    (

    [PostingDate] [DATETIME] NOT NULL,

    [CreditAmount] [decimal](38, 20) NOT NULL,

    [CreditMemoNum] [Int] NOT NULL,

    [OrderNum][Int] NOT NULL

    )

    Insert into [dbo].[tempCreditMemo]

    (

    [PostingDate],[CreditAmount],[CreditMemoNum], [OrderNum]

    )

    VALUES

    ( '2012-01-09', '5.01', '8821', '1234')

    ,( '2012-01-20', '10.00','8822', '1235')

    ,( '2012-02-14', '200.01','8823', '1237')

    ,( '2012-03-25', '300.10','8824', '1238')

    ,( '2013-01-29', '100.52','8825', '1239')

    ,( '2013-02-25', '10.99', '8826', '1240')

    select * from [dbo].[tempCreditMemo]

    CREATE TABLE [dbo].[tempSalesInvoice]

    (

    [PostingDate] [DATETIME] NOT NULL,

    [SaleAmount] [decimal](38, 20) NOT NULL,

    [SalesInvNum] [Int] NOT NULL,

    [OrderNum][Int] NOT NULL

    )

    Insert into [dbo].[tempSalesInvoice]

    (

    [PostingDate],[SaleAmount],[SalesInvNum], [OrderNum]

    )

    VALUES

    ( '2012-01-09', '1500.01', '6624', '1234')

    ,( '2012-01-20', '1200.00','6625', '1235')

    ,( '2012-02-14', '5750.00','6626', '1237')

    ,( '2012-03-25', '10542.88','6627', '1238')

    ,( '2013-01-29', '99000.86','6628', '1239')

    ,( '2013-02-25', '105000.89', '6629', '1240')

    select * from [dbo].[tempSalesInvoice]

    select * from [dbo].[tempCreditMemo]

    Select * from [dbo].[tempCustomerLedgerEntry]

    /*

    Truncate table [dbo].[tempCustomerLedgerEntry]

    go

    Truncate table [dbo].[tempCreditMemo]

    go

    Truncate table [dbo].[tempSalesInvoice]

    go

    */

    --Below are the result sets I am looking to combine

    --I will also add an additional column the performs

    --the calculation of Amount1-(Amount3-Amount2) as ProfitCalc

    select Year(PostingDate) as PostingYear

    , SUM(SalesLCY) as Amount1

    from [dbo].[tempCustomerLedgerEntry]

    group by Year(PostingDate)

    order by Year(PostingDate) desc

    select Year(PostingDate) as PostingYear

    , SUM(CreditAmount) as Amount2

    from [dbo].[tempCreditMemo]

    group by Year(PostingDate)

    order by Year(PostingDate) desc

    select Year(PostingDate) as PostingYear

    , SUM(SaleAmount) as Amount3

    from [dbo].[tempSalesInvoice]

    group by Year(PostingDate)

    order by Year(PostingDate) desc

    The final result would look like the following below.

    Posting Year Amount1 Amount2 Amount3 ProfitCalc

    2013 204001.75 111.51 204001.75 111.51

    2012 18153 515.12 18992.89 -839.89

    Please let me know your thoughts on the best way to run such a query. Thanks.

  • If I'm understanding this correctly, it looks like you probably want to create a single CTE that holds the contents of all 3 of your select statements. You can combine the results of multiple select statements with UNION ALL. Then you can simply do your final grouping and selecting from the CTE.

    Does that make sense? Is that the type of solution you are looking for? Do you need to see example code? Please let me know. Thanks! 🙂


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • You can actually just do the UNION ALL stuff and your calculations without a CTE too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Yes that does make sense and I will try to create the code tonight. If I run into any problems I'll let you know.

  • TheSQLGuru (11/14/2013)


    You can actually just do the UNION ALL stuff and your calculations without a CTE too.

    I did originally try with union but I was running into an issue. I'll give it another look soon. Thanks for the reply.

  • Be sure you are clear on the difference between UNION, and UNION ALL. Give a holler if you need any more help. 🙂


    "If I had been drinking out of that toilet, I might have been killed." -Ace Ventura

  • autoexcrement (11/14/2013)


    Be sure you are clear on the difference between UNION, and UNION ALL. Give a holler if you need any more help. 🙂

    In case you aren't UNION will do a SORT/DISTINCT under the covers to remove duplicates - a VERY expensive operation. UNION ALL will not do this. In cases where you can't possibly have dupes or you don't care about them, ALWAYS explicitly user UNION ALL!!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • UNIONing your queries won't quite get you to where you want to go.

    What you want is often called a cross-tab, and here's a way to do it in T-SQL, using your DDL and sample data:

    ; WITH cte AS (

    SELECT YEAR(PostingDate) AS PostingYear

    , SUM(SalesLCY) AS Amount

    , 1 AS sourceTable

    FROM [dbo].[tempCustomerLedgerEntry]

    GROUP BY YEAR(PostingDate)

    UNION ALL

    SELECT YEAR(PostingDate) AS PostingYear

    , SUM(CreditAmount) AS Amount

    , 2 AS sourceTable

    FROM [dbo].[tempCreditMemo]

    GROUP BY Year(PostingDate)

    UNION ALL

    SELECT Year(PostingDate) AS PostingYear

    , SUM(SaleAmount) AS Amount

    , 3 AS sourceTable

    FROM [dbo].[tempSalesInvoice]

    GROUP BY Year(PostingDate)

    )

    SELECT PostingYear

    ,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) AS Amount1

    ,MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END) AS Amount2

    ,MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) AS Amount3

    ,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) - (MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) - MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END)) AS ProfitCalc

    FROM cte

    GROUP BY PostingYear

    ORDER BY PostingYear DESC

    You'll have to do some conversion to decimal types with smaller scale (places to the right of the decimal point) if you don't want all those trailing zeros in the results, which leads to an important question:

    Why does your database store what seem to be currency values with only two places to the right of the decimal as decimal(38,20)? Do you actually have values that have 20 significant digits to the right of the decimal point, or are these values all dollars and cents? You could store a value of six times the U.S. GDP for 2012 ($15,684 billion dollars) in dollars and cents as decimal(16, 2). The advantage of this (in addition to not ending up with a bunch of non-significant zeros to the right of the decimal point in the results of simple arithmetic) is that a decimal(16,2) value requires only 9 bytes of storage while a decimal(38,20) value requires 17 bytes. That's eight unnecessary bytes for every value, which adds up to a LOT of wasted disk space and memory!

    Jason Wolfkill

  • Thanks for posting WolfKillJ!

    Your code worked really well. Correct me if I'm wrong about the logic behind it.

    You create the CTE object to start. In each of the original queries you add a column named source table to act as a place holder to later be used in the final aggregate functions. Union All was chosen since we wanted all values and not just distinct values filtered. Then you close the CTE and select from it. In the select list you used PostingYear to later be grouped on. Then the Amount columns from each individual query are used in the MAX aggregate function. The Case statements choose the place holder column to pick the unique amount column and then alias that column to differentiate it in the result set. Then the last column is the calculation where the MAX aggregate is used with CASE to put in the business logic. Finally the group by postingyear and order by postingyear for sorting.

    I am a little confused by the case statement. Why couldn't a normal select be used and then have the columns in the final group by?

    To answer your questions about the DDL. The decimal places chosen was done over 10 years ago and I don't know why. this is a database I now manage and I can only attribute it to convenience for the DEV team or lack of thought to the design from a storage and performance perspective.

    There are no values that I have come across which have 20 significant digits to the right of the decimal. We do work with multiple currencies other than USD but I've yet to see a real life need for it. Would it be a good idea to truncate 17 or 18 of the digits?

    There are many other tables and columns that have this issue. Our ERP is Navision and its history goes back to being developed in the Denmark before Microsoft bought it.

    It is heavily customizable but developers may not have control over the data types they create for new columns. One day I will try to convert some of them on the backend of our test database just to see if the front end will work. All of the points you make about the data type hit home. I have so many issues with performance and moving tables around to different filegroups just to accommodate the space issues. Problems with inefficient design are endemic to this database in almost every aspect you can imagine. Now I'm part of a team that is building a datawarehouse and I'm discovering a lot more of these issues. --End Rant!

    Thanks again for your post! It was great to see a CTE in action with some real data.

    wolfkillj (11/15/2013)


    UNIONing your queries won't quite get you to where you want to go.

    What you want is often called a cross-tab, and here's a way to do it in T-SQL, using your DDL and sample data:

    ; WITH cte AS (

    SELECT YEAR(PostingDate) AS PostingYear

    , SUM(SalesLCY) AS Amount

    , 1 AS sourceTable

    FROM [dbo].[tempCustomerLedgerEntry]

    GROUP BY YEAR(PostingDate)

    UNION ALL

    SELECT YEAR(PostingDate) AS PostingYear

    , SUM(CreditAmount) AS Amount

    , 2 AS sourceTable

    FROM [dbo].[tempCreditMemo]

    GROUP BY Year(PostingDate)

    UNION ALL

    SELECT Year(PostingDate) AS PostingYear

    , SUM(SaleAmount) AS Amount

    , 3 AS sourceTable

    FROM [dbo].[tempSalesInvoice]

    GROUP BY Year(PostingDate)

    )

    SELECT PostingYear

    ,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) AS Amount1

    ,MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END) AS Amount2

    ,MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) AS Amount3

    ,MAX(CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END) - (MAX(CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END) - MAX(CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END)) AS ProfitCalc

    FROM cte

    GROUP BY PostingYear

    ORDER BY PostingYear DESC

    You'll have to do some conversion to decimal types with smaller scale (places to the right of the decimal point) if you don't want all those trailing zeros in the results, which leads to an important question:

    Why does your database store what seem to be currency values with only two places to the right of the decimal as decimal(38,20)? Do you actually have values that have 20 significant digits to the right of the decimal point, or are these values all dollars and cents? You could store a value of six times the U.S. GDP for 2012 ($15,684 billion dollars) in dollars and cents as decimal(16, 2). The advantage of this (in addition to not ending up with a bunch of non-significant zeros to the right of the decimal point in the results of simple arithmetic) is that a decimal(16,2) value requires only 9 bytes of storage while a decimal(38,20) value requires 17 bytes. That's eight unnecessary bytes for every value, which adds up to a LOT of wasted disk space and memory!

  • kwoznica (11/16/2013)


    Thanks for posting WolfKillJ!

    Your code worked really well. Correct me if I'm wrong about the logic behind it.

    You create the CTE object to start. In each of the original queries you add a column named source table to act as a place holder to later be used in the final aggregate functions. Union All was chosen since we wanted all values and not just distinct values filtered. Then you close the CTE and select from it. In the select list you used PostingYear to later be grouped on. Then the Amount columns from each individual query are used in the MAX aggregate function. The Case statements choose the place holder column to pick the unique amount column and then alias that column to differentiate it in the result set. Then the last column is the calculation where the MAX aggregate is used with CASE to put in the business logic. Finally the group by postingyear and order by postingyear for sorting.

    I am a little confused by the case statement. Why couldn't a normal select be used and then have the columns in the final group by?

    To answer your questions about the DDL. The decimal places chosen was done over 10 years ago and I don't know why. this is a database I now manage and I can only attribute it to convenience for the DEV team or lack of thought to the design from a storage and performance perspective.

    There are no values that I have come across which have 20 significant digits to the right of the decimal. We do work with multiple currencies other than USD but I've yet to see a real life need for it. Would it be a good idea to truncate 17 or 18 of the digits?

    There are many other tables and columns that have this issue. Our ERP is Navision and its history goes back to being developed in the Denmark before Microsoft bought it.

    It is heavily customizable but developers may not have control over the data types they create for new columns. One day I will try to convert some of them on the backend of our test database just to see if the front end will work. All of the points you make about the data type hit home. I have so many issues with performance and moving tables around to different filegroups just to accommodate the space issues. Problems with inefficient design are endemic to this database in almost every aspect you can imagine. Now I'm part of a team that is building a datawarehouse and I'm discovering a lot more of these issues. --End Rant!

    Thanks again for your post! It was great to see a CTE in action with some real data.

    Glad it works for you, kwoznica.

    It may help you make sense of the logic in my query if you replace the SELECT . . . FROM cte GROUP BY PostingYear ORDER BY PostingYear with a simple SELECT * FROM cte and examine the results.

    You'll see that you get a row for each unique PostingYear/sourceTable combination with the Amount column consisting of the SUM total for that PostingYear/sourceTable:

    PostingYearAmountsourceTable

    201218153.000000000000000000001

    2013204001.750000000000000000001

    2012515.120000000000000000002

    2013111.510000000000000000002

    201218992.890000000000000000003

    2013204001.750000000000000000003

    Since you need a single row per postingYear, you'll need to pivot these rows to columns. You could use the PIVOT relational operator, but I find the PIVOT syntax to be confusing and the concept itself limiting, which is why I prefer the MAX()-based cross-tab construct. This technique relies on the fact that T-SQL aggregate functions ignore NULLs. To see why the CASE expression is necessary, run this SELECT from the CTE:

    SELECT PostingYear

    ,CASE WHEN sourceTable = 1 THEN Amount ELSE NULL END AS Amount1

    ,CASE WHEN sourceTable = 2 THEN Amount ELSE NULL END AS Amount2

    ,CASE WHEN sourceTable = 3 THEN Amount ELSE NULL END AS Amount3

    FROM cte

    and examine the results:

    PostingYearAmount1Amount2Amount3

    201218153.00000000000000000000NULLNULL

    2013204001.75000000000000000000NULLNULL

    2012NULL515.12000000000000000000NULL

    2013NULL111.51000000000000000000NULL

    2012NULLNULL18992.89000000000000000000

    2013NULLNULL204001.75000000000000000000

    You'll see that there is only one non-NULL value for each Amount column per PostingYear. Grouping the results by PostingYear will give you a single row per PostingYear and taking the MAX() of the Amount columns will give you the one non-NULL value for each column, and of course, the ProfitCalc column just does the math on the same values. Et voila:

    PostingYear Amount1 Amount2 Amount3 ProfitCalc

    2013 204001.75000000000000000000 111.51000000000000000000204001.75000000000000000000 111.51000000000000000000

    201218153.00000000000000000000515.1200000000000000000018992.89000000000000000000-324.77000000000000000000

    Let me know if you still have questions. Sorry about the lousy formatting of the results - it's a pain trying to make them look right in this forum.

    As an aside, I often break down complex T-SQL in this way - "unwrapping" the code in steps to examine intermediate result sets and the results of functions and expressions that are fed into other functions and expressions - when troubleshooting or just trying to understand the logic.

    Jason Wolfkill

  • kwoznica (11/16/2013)


    To answer your questions about the DDL. The decimal places chosen was done over 10 years ago and I don't know why. this is a database I now manage and I can only attribute it to convenience for the DEV team or lack of thought to the design from a storage and performance perspective.

    There are no values that I have come across which have 20 significant digits to the right of the decimal. We do work with multiple currencies other than USD but I've yet to see a real life need for it. Would it be a good idea to truncate 17 or 18 of the digits?

    There are many other tables and columns that have this issue. Our ERP is Navision and its history goes back to being developed in the Denmark before Microsoft bought it.

    It is heavily customizable but developers may not have control over the data types they create for new columns. One day I will try to convert some of them on the backend of our test database just to see if the front end will work. All of the points you make about the data type hit home. I have so many issues with performance and moving tables around to different filegroups just to accommodate the space issues. Problems with inefficient design are endemic to this database in almost every aspect you can imagine. Now I'm part of a team that is building a datawarehouse and I'm discovering a lot more of these issues. --End Rant!

    Thanks again for your post! It was great to see a CTE in action with some real data.

    One of the great things about building a data warehouse is that you can design it to use appropriate data types even where the underlying OLTP/ERP systems don't and manage the conversions in the ETL process. I'd encourage you to push as hard as you can to keep bad data architecture choices from your ERP system from creeping into the data warehouse. After all, the idea of a data warehouse is to improve reporting and analysis - replicating poor design from the underlying system frustrates this goal. I would also suggest that you push for some requirements analysis that might allow you to implement some even better design choices - for example, in many cases, dropping the fractional part of currency values makes no appreciable difference in aggregations, so unless someone really needs the fractional part, storing those values as integers can save even more storage and memory.

    Jason Wolfkill

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply