Refactor subqueries into CTEs

Question

Refactor subqueries into CTEs

SSCertifiable

Points: 5969

April 23, 2021 at 1:54 pm

I've been tasked with turning this SQL into a stored procedure to populate an Aggregation table (one that stores data that's not particularly mutable - refreshed monthly). As you can see, it uses multiple nested sub-queries.

SELECT [newdata].[EntityId],
 [E].[Name] AS Entity,
 [newdata].[PlanningFamilyId],
 [PF].[Name] AS [PlanningFamily],
 [newdata].[SKU] AS [Sku],
 [newdata].[PercentageRate] AS [SplitRate]
    FROM (SELECT  [splits_data].[EntityId],
   [splits_data].[PlanningFamilyId],
   [splits_data].[SKU],
   [splits_data].[PercentageRate]
  FROM (SELECT [skutotal].[EntityId],
     [skutotal].[PlanningFamilyId],
     [skutotal].[SKU],
    CASE
   WHEN([familytotal].[Total]) = 0
   THEN 0
   ELSE 100 / ([familytotal].[Total]) * ([skutotal].[Total])
    END AS [PercentageRate]
 FROM (
SELECT [alldata].[EntityId],
  [alldata].[PlanningFamilyId],
   COALESCE([Corporate].[CorporateProduct], [alldata].[SKU]) AS [SKU],
   SUM([alldata].[Period1]) AS [Period1],
   SUM([alldata].[Period2]) AS [Period2],
   SUM([alldata].[Period3]) AS [Period3],
   SUM([alldata].[Period4]) AS [Period4],
   SUM([alldata].[Period5]) AS [Period5],
   SUM([alldata].[Period6]) AS [Period6],
   SUM([alldata].[Period7]) AS [Period7],
   SUM([alldata].[Period8]) AS [Period8],
   SUM([alldata].[Period9]) AS [Period9],
   SUM([alldata].[Period10]) AS [Period10],
   SUM([alldata].[Period11]) AS [Period11],
   SUM([alldata].[Period12]) AS [Period12],
   SUM([alldata].[Period1] + [alldata].[Period2] + [alldata].[Period3] + [alldata].[Period4] + [alldata].[Period5] + [alldata].[Period6] + [alldata].[Period7] + [alldata].[Period8] + [alldata].[Period9] + [alldata].[Period10] + [alldata].[Period11] + [alldata].[Period12]) AS [Total]
    FROM (SELECT [EntityId],
 [PlanningFamilyId],
 [SKU],
 [pivottab].[Period1],
 [pivottab].[Period2],
 [pivottab].[Period3],
 [pivottab].[Period4],
 [pivottab].[Period5],
 [pivottab].[Period6],
 [pivottab].[Period7],
 [pivottab].[Period8],
 [pivottab].[Period9],
 [pivottab].[Period10],
 [pivottab].[Period11],
 [pivottab].[Period12]
   FROM (SELECT     [EntityId],
 [PlanYear],
 [PlanMonth],
 [vwActualsSales].[PlanningFamilyId],
 [Product] AS [SKU],
 SUM([QTY]) AS [Qty]
FROM [DP].[vwActualsSales]
WHERE [InvoiceDateAdjusted] BETWEEN DATEADD([mm], DATEDIFF([mm], 0, GETDATE()) - 6, 0) AND 
DATEADD([dd], -1, DATEADD([mm], DATEDIFF([mm], 0, GETDATE()) - 0, 0))
AND [QTY] > 0
AND [Obsolete] <> 'Y'
AND [Inst_Impl] = 'P'
AND EntityId IN ( @EntityId )
AND PlanningFamilyId IN ( @PlanningFamilyId )
GROUP BY [EntityId],
   [PlanYear],
   [PlanMonth],
   [vwActualsSales].[PlanningFamilyId],
   [Product]
) AS [sourcetab]
   CROSS APPLY (SELECT CASE
  WHEN [sourcetab].[PlanMonth] = 1
 THEN [Qty]
  ELSE 0
   END AS [Period1],
   CASE
  WHEN [sourcetab].[PlanMonth] = 2
 THEN [Qty]
  ELSE 0
   END AS [Period2],
   CASE
  WHEN [sourcetab].[PlanMonth] = 3
 THEN [Qty]
  ELSE 0
   END AS [Period3],
   CASE
  WHEN [sourcetab].[PlanMonth] = 4
 THEN [Qty]
  ELSE 0
   END AS [Period4],
   CASE
  WHEN [sourcetab].[PlanMonth] = 5
 THEN [Qty]
  ELSE 0
   END AS [Period5],
   CASE
  WHEN [sourcetab].[PlanMonth] = 6
 THEN [Qty]
  ELSE 0
   END AS [Period6],
   CASE
  WHEN [sourcetab].[PlanMonth] = 7
 THEN [Qty]
  ELSE 0
   END AS [Period7],
   CASE
  WHEN [sourcetab].[PlanMonth] = 8
 THEN [Qty]
  ELSE 0
   END AS [Period8],
   CASE
  WHEN [sourcetab].[PlanMonth] = 9
 THEN [Qty]
  ELSE 0
   END AS [Period9],
   CASE
  WHEN [sourcetab].[PlanMonth] = 10
 THEN [Qty]
  ELSE 0
   END AS [Period10],
   CASE
  WHEN [sourcetab].[PlanMonth] = 11
 THEN [Qty]
  ELSE 0
   END AS [Period11],
   CASE
  WHEN [sourcetab].[PlanMonth] = 12
 THEN [Qty]
  ELSE 0
   END AS [Period12]) AS [pivottab]
) AS [alldata]
LEFT JOIN [SERVERNAME].[MYCOMPANY_DW].[dbo].[d_inv_product_alias] AS [Corporate]
ON [Corporate].[LocalProduct] = [alldata].[SKU]
    WHERE [alldata].[PlanningFamilyId] <> ''
   AND [alldata].[PlanningFamilyId] IS NOT NULL
    GROUP BY [alldata].[EntityId],
   [alldata].[PlanningFamilyId],
  COALESCE([Corporate].[CorporateProduct], [alldata].[SKU])
) AS [skutotal]
 INNER JOIN (SELECT  [alldata].[EntityId],
 [alldata].[PlanningFamilyId],
 SUM([alldata].[Period1]) AS [Period1],
 SUM([alldata].[Period2]) AS [Period2],
 SUM([alldata].[Period3]) AS [Period3],
 SUM([alldata].[Period4]) AS [Period4],
 SUM([alldata].[Period5]) AS [Period5],
 SUM([alldata].[Period6]) AS [Period6],
 SUM([alldata].[Period7]) AS [Period7],
 SUM([alldata].[Period8]) AS [Period8],
 SUM([alldata].[Period9]) AS [Period9],
 SUM([alldata].[Period10]) AS [Period10],
 SUM([alldata].[Period11]) AS [Period11],
 SUM([alldata].[Period12]) AS [Period12],
 SUM([alldata].[Period1] + [alldata].[Period2] + [alldata].[Period3] + [alldata].[Period4] + [alldata].[Period5] + [alldata].[Period6] + [alldata].[Period7] + [alldata].[Period8] + [alldata].[Period9] + [alldata].[Period10] + [alldata].[Period11] + [alldata].[Period12]) AS [Total]
   FROM (SELECT      [EntityId],
 [PlanningFamilyId],
 [SKU],
 [pivottab].[Period1],
 [pivottab].[Period2],
 [pivottab].[Period3],
 [pivottab].[Period4],
 [pivottab].[Period5],
 [pivottab].[Period6],
 [pivottab].[Period7],
 [pivottab].[Period8],
 [pivottab].[Period9],
 [pivottab].[Period10],
 [pivottab].[Period11],
 [pivottab].[Period12]
   FROM (SELECT     [EntityId],
 [PlanYear],
 [PlanMonth],
 [vwActualsSales].[PlanningFamilyId],
 [Product] AS [SKU],
 SUM([QTY]) AS [Qty]
FROM [DP].[vwActualsSales]
WHERE [InvoiceDateAdjusted] BETWEEN DATEADD([mm], DATEDIFF([mm], 0, GETDATE()) - 6, 0) AND 
DATEADD([dd], -1, DATEADD([mm], DATEDIFF([mm], 0, GETDATE()) - 0, 0))
AND [QTY] > 0
AND [Obsolete] <> 'Y'
AND [Inst_Impl] = 'P'
AND EntityId IN ( @EntityId )
AND PlanningFamilyId IN ( @PlanningFamilyId )
GROUP BY [EntityId],
   [PlanYear],
   [PlanMonth],
   [vwActualsSales].[PlanningFamilyId],
   [Product]
) AS [sourcetab]
 CROSS APPLY (SELECT CASE
WHEN [sourcetab].[PlanMonth] = 1
THEN [Qty]
ELSE 0
 END AS [Period1],
 CASE
WHEN [sourcetab].[PlanMonth] = 2
THEN [Qty]
ELSE 0
 END AS [Period2],
 CASE
WHEN [sourcetab].[PlanMonth] = 3
THEN [Qty]
ELSE 0
 END AS [Period3],
 CASE
WHEN [sourcetab].[PlanMonth] = 4
THEN [Qty]
ELSE 0
 END AS [Period4],
 CASE
WHEN [sourcetab].[PlanMonth] = 5
THEN [Qty]
ELSE 0
 END AS [Period5],
 CASE
WHEN [sourcetab].[PlanMonth] = 6
THEN [Qty]
ELSE 0
 END AS [Period6],
 CASE
WHEN [sourcetab].[PlanMonth] = 7
THEN [Qty]
ELSE 0
 END AS [Period7],
 CASE
WHEN [sourcetab].[PlanMonth] = 8
THEN [Qty]
ELSE 0
 END AS [Period8],
 CASE
WHEN [sourcetab].[PlanMonth] = 9
THEN [Qty]
ELSE 0
 END AS [Period9],
 CASE
WHEN [sourcetab].[PlanMonth] = 10
THEN [Qty]
ELSE 0
 END AS [Period10],
 CASE
WHEN [sourcetab].[PlanMonth] = 11
THEN [Qty]
ELSE 0
 END AS [Period11],
 CASE
WHEN [sourcetab].[PlanMonth] = 12
THEN [Qty]
ELSE 0
 END AS [Period12]) AS [pivottab]) AS [alldata]
   WHERE [alldata].[PlanningFamilyId] <> ''
 AND [alldata].[PlanningFamilyId] IS NOT NULL
   GROUP BY [alldata].[EntityId],
  [alldata].[PlanningFamilyId]
) AS [familytotal]
ON [skutotal].[EntityId] = [familytotal].[EntityId] AND 
   [skutotal].[PlanningFamilyId] = [familytotal].[PlanningFamilyId]
) AS [splits_data]

) AS [newdata]
    INNER JOIN [DP].[PlanningFamily] PF
   ON RTRIM(PF.[PlanningFamilyId]) = RTRIM([newdata].[PlanningFamilyId])
  AND PF.[IsRevenue] = 0
    INNER JOIN DP.Brand B ON B.BrandId = PF.BrandId
    INNER JOIN DP.Entity E ON E.EntityId = [newdata].EntityId
ORDER BY Entity, PlanningFamily, Sku

I don't like this design personally (and I really don't like the tool that was used for the code formatting!) and I'd like to refactor it to use CTEs. Without delving into the schema design, and I realise this might be a finger-in-the-air exercise, but does it seem like doing this would impact performance? Mainly I'm concerned with readability and maintenance, and as this will likely only be executed occasionally (or monthly via a SQL job) then performance isn't a huge issue.

This topic was modified 5 years, 3 months ago by edwardwill.

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 1

From my experience, nested selects vs CTE's have had very similar performance.

If you have a SQL formatting tool (such as RedGate SQL Prompt or ApexSQL Refactor), I would start by getting those configured to how you like the code to be (or what your coding standards indicate the code should be) as a first step. Working with well formatted code makes things a lot easier to refactor later on. Plus, it ensures the code is formatted consistently. Yours looks pretty consistent, but doesn't hurt to run a tool like that.

Next, I would rename your table aliases UNLESS those names make sense to you. For me though, I never have single or 2 character aliases as it means I am doing a lot of scrolling in my code to see what it means.

After that, refactoring a nested select into a CTE should be pretty easy to do.

BUT, like all things SQL, I would test this. Take the query and time how long it takes to do 3 runs of it on a test system. Then convert that to use CTE's and do 3 runs again and see how much the performance changed.

Another thing that can help performance is if you are using linked servers (not sure if you are, I didn't see any when I eyeballed your code, but I may have missed some), if you pull them into a table variable or temp table first. Linked server tables in SQL 2017 and older (I believe... 2012 and older for sure, but I think it was only fixed in 2019) estimate 1 row coming across the link.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

edwardwill SSCertifiable Points: 5969 More actions · Answer 2

Thank you Brian. I believe that the developer who wrote the SQL uses ApexSQL Refactor, but although it's consistent, I find the way the code is laid out hard to follow (I realise these things are a matter of subjectivity, and there are no company standards). I also agree about the aliasing with single letters - again, that's standard in this company, but I can use my own aliases (or none at all, which is my preference except where there are multiple uses of the same table). I just find navigating my way down six layers of nesting rather wearisome!

Mr. Brian Gale One Orange Chip Points: 25026 More actions · Answer 3

I agree with you 110%!

I almost always use CTE's over nested selects. I find them easier to read and easier to test and debug. "SELECT * FROM <cte name>" after the CTE is easy for me to validate and verify and I can even add filters on to it or join it to another table or query with ease AND without breaking the original logic while I am doing testing. Once I am happy the CTE is valid and working as designed, I can comment it out and work on the next CTE (if they are unrelated) or start working on the rest of the query.

Nested selects I find when I am debugging them I get hit with copy-paste errors where I accidentally pull in hard-coded filtering I did on the data.

With respect to having no company standards, it may not be a bad idea to implement some or at least suggest it to the DBA/Developer team. It seems like extra overhead, but it adds a LOT to the readability and maintainability of the code. Plus you can tackle things like "loops should be avoided in all code. If a loop is required it MUST be well documented as to why it is needed and why other methods didn't work and, where applicable, what other methods were tested."

Now, with ApexSQL Refactor (or RedGate SQL Prompt), if you have a license for it, you can go into the settings and spend some time getting it set up EXACTLY how you like, and then you can refactor the code to your preference. Plus I believe you can have a second configuration that is the "company standard" configuration and once you have your code working, you run the "company standard" config against it and it is ready for review and to be pushed up the chain eventually to production. Then when you need to change code, you grab the code, run your config against it and you can change it with your standard.

Or, since you have no company standards at this time, you can push for your standards to be company standards. I know one thing I prefer is tabs over spaces for example. Some people agree with me, others think I am a monster for doing this. Some code editors handle tabs great, others completely screw up the formatting when you use tabs. But with the formatting tools, you can have it automatically switch between tabs and spaces in your code. My group has a "use whichever you feel like, but make sure it is consistent in the code" approach, while another department that I sometimes help with is in the "spaces" camp and tabs in code result in code verification failure.

I think my biggest issue with nested selects though is when they are nested deeply and brackets are inconsistent. Sometimes on new lines all on their own, sometimes at the end of the line.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Jeffrey Williams SSC Guru Points: 90347 More actions · Answer 4

To be honest - I think you are going to find that converting this to use CTE's isn't going to be easy nor worth the actual time spent. The problem here is that the original developer built this code as SELECT ... FROM (SELECT ... FROM (SELECT ... FROM ...

Embedded in this is a PIVOT and linked server - and unfortunately, that PIVOT uses the same derived table pattern.

My recommendation would be a total refactor of this - instead of trying to move pieces to CTE's. Since this is going to be a stored procedure there is no reason or need to do everything in a single-query. And since this is going to be used to build a local table on a schedule - it means breaking this out to build temp tables in a divide & conquer approach is much more feasible.

In fact - it might even be better to build a local version of the table(s) from the linked server instead of temp tables, which would make that data not only available for this procedure - but available for other procedures. That could be part of this procedure or a completely separate process - depends on how often that data needs to be refreshed.

It might also make sense to either 1) build a view to replace the PIVOT or 2) build a pivoted permanent table on a schedule. Not only would that satisfy the requirement for this procedure - but again, makes that data available to other processes (which I am almost certain is repeated in other code).

I am sure there are many more opportunities here...but that would be where I would start.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

edwardwill SSCertifiable Points: 5969 More actions · Answer 5

Thank you both for the time you've taken. I'm attracted by Jeffrey Williams's solution as it seems to break the unwieldy code into more atomic chunks, which follows good SE practice. But I'll have to run this past the owner of the application ...