Query Optimization Advice

  • daniness - Friday, December 1, 2017 6:14 AM

    ScottPletcher - Thursday, November 30, 2017 12:36 PM

    Are you willing to try changing the Clustering on table FctCoverageLengthDetail as I suggested earlier?  I'm just curious as to how much it would help.

    Hi ScottPletcher,
    So I tried your suggestion of the clustered index on the FctCoverageLengthDetail table, but I don't believe it improved performance, as the query took about 29 minutes to run, returning 15 M or so records. Thank you for the feedback though :-).

    Just to clarify... this code is supposed to return 15 MILLION rows?  To Tableaue????

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I haven't seen a query quite as spectacular as this one before but have come across similar creations in the past.
    It looks like something that has been abused over the years to cater for changing requirements, without anybody deciding to just stop and start again.
    If it was written that this from day one then the author needs to be checked for substance abuse. They also had no consideration for those that would follow in their footsteps and I'm willing to bet that there is no documentation.
    As time goes by this will take more time and resources to maintain, as it will never get simpler in its current format.

    The execution plan shows you that almost every table or index is being scanned, and the reasons behind this have already been mentioned, with an attempt to throw what appears to be quite complex logic all into one query. I have come across people that believe one massive query is more effective that a lot of little queries and they are generally wrong - at the very least it can be harder to maintain (as you are rather unfortunately discovering now) and that in itself becomes expensive.

    Your best course of action is to approach whoever you have to, to get the time and resources to re-write this. Tell them that some of the best people out there have looked at it and screamed - and some of them get paid a lot of money to fix problems.

    You need to know what it is trying to do, so you have the output from this query as a start. I assume the output is correct, it's just the time it takes that is the issue.

    It needs to be broken down into smaller steps, such as identifying where data could be manipulated in discrete steps, probably into temporary tables. These temporary tables could be built up to create data in the format required for a final query, which provides the final results. Breaking this down into such steps would make it a lot clearer and easier to maintain.

    Time and clear thinking is the only answer to this one. Our apologies for not being able to quickly fix the situation for you but this really is a bit of a monster.

  • Jeff Moden - Friday, December 1, 2017 7:40 AM

    daniness - Friday, December 1, 2017 6:14 AM

    ScottPletcher - Thursday, November 30, 2017 12:36 PM

    Are you willing to try changing the Clustering on table FctCoverageLengthDetail as I suggested earlier?  I'm just curious as to how much it would help.

    Hi ScottPletcher,
    So I tried your suggestion of the clustered index on the FctCoverageLengthDetail table, but I don't believe it improved performance, as the query took about 29 minutes to run, returning 15 M or so records. Thank you for the feedback though :-).

    Just to clarify... this code is supposed to return 15 MILLION rows?  To Tableaue????

    Yes, it does.  The OP posted the execution plan for the query.  It is a mess and definitely needs a serious rewrite.

  • Hi Jeff,

    Thanks for your reply. Yes, this code has been returning about 15 million rows...which seems to be the norm for the business here. The business users here do use Tableau. I'm thinking this particular user has been tweaking the sql code to customize the report. Please let me know if you need any further clarification. Thanks.

  • Lynn Pettis - Wednesday, November 29, 2017 2:23 PM

    Okay, you seriously need to figure out what it is they are expecting and figure out a better way to get it.  You are scanning the same index several times with each returning 15+ million rows of data.  You are also scanning at least one table several times as well.
    Also, what is this doing (break it down into its pieces):

        AND [CoverageLengthMonthNumber] <= round((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2) + '/' + RIGHT([CoverageEffectiveDateKey],2) + '/'
               + LEFT([CoverageEffectiveDateKey],4) AS DATE),CAST(LEFT(RIGHT(OriginalMaturityDateKey,4),2) + '/' + RIGHT(OriginalMaturityDateKey,2)
               + '/' + LEFT(OriginalMaturityDateKey,4) AS DATE)))/(365/12),0)+5 --(1:45m 7.9M)

    And by the way, this won't be a trivial task.  I don't have any idea where to start, but then that is because I only see the query, not the data or the current results (again 15.5+ million rows of data).

    Decided to break this down into pieces and just "construct" fake data for it so I could see what it was doing, and it appears to be doing something utterly unnecessary.   Data that would make this "work" would involve the CoverageEffectiveDateKey to be an 8 character date string in YYYYMMDD format,
    and then it appears to be deciding that a date difference of some number of months should be done by dividing the date difference in days by 365/12.   Honestly, I can't come up with a business reason to do this, but it's possible.   Seems to me that would get messed up by leap years, and return different results for any data in there that spans such a time frame.   Also, given that this is "integer math", that division will always yield 30, so why the need to ask the server to compute that value at all?   Also, given that the column value has to be in the YYYYMMDD format, that ridiculously long expression can be considerably simplified by just casting or converting to date, without the LEFT and RIGHT functions or string concatenation at all.

    Here's the simplified version:
    AND CoverageLengthMonthNumber <= ROUND(DATEDIFF(day, CONVERT(date, CoverageEffectiveDateKey), CONVERT(date, OriginalMaturityDateKey)) / 30, 0) + 5    --(1:45m 7.9M)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • daniness - Friday, December 1, 2017 7:46 AM

    Hi Jeff,

    Thanks for your reply. Yes, this code has been returning about 15 million rows...which seems to be the norm for the business here. The business users here do use Tableau. I'm thinking this particular user has been tweaking the sql code to customize the report. Please let me know if you need any further clarification. Thanks.

    Is the query generated by Tableau or is someone writing the query and putting it in Tableau?

    Can you convert it to a stored procedure and start breaking down the pieces using temp tables?  That might help get rid of the calculations and criteria using functions.

  • BrainDonor - Friday, December 1, 2017 7:45 AM

    I haven't seen a query quite as spectacular as this one before but have come across similar creations in the past.
    It looks like something that has been abused over the years to cater for changing requirements, without anybody deciding to just stop and start again.
    If it was written that this from day one then the author needs to be checked for substance abuse. They also had no consideration for those that would follow in their footsteps and I'm willing to bet that there is no documentation.
    As time goes by this will take more time and resources to maintain, as it will never get simpler in its current format.

    The execution plan shows you that almost every table or index is being scanned, and the reasons behind this have already been mentioned, with an attempt to throw what appears to be quite complex logic all into one query. I have come across people that believe one massive query is more effective that a lot of little queries and they are generally wrong - at the very least it can be harder to maintain (as you are rather unfortunately discovering now) and that in itself becomes expensive.

    Your best course of action is to approach whoever you have to, to get the time and resources to re-write this. Tell them that some of the best people out there have looked at it and screamed - and some of them get paid a lot of money to fix problems.

    You need to know what it is trying to do, so you have the output from this query as a start. I assume the output is correct, it's just the time it takes that is the issue.

    It needs to be broken down into smaller steps, such as identifying where data could be manipulated in discrete steps, probably into temporary tables. These temporary tables could be built up to create data in the format required for a final query, which provides the final results. Breaking this down into such steps would make it a lot clearer and easier to maintain.

    Time and clear thinking is the only answer to this one. Our apologies for not being able to quickly fix the situation for you but this really is a bit of a monster.

    Hi Steve,

    Thank you for all your wisdom. I really appreciate it. I definitely do agree....this query is of behemoth proportion. I do believe that it's been tailored over time, depending on what the need was. The output is good, but it's the duration of time it takes to run which is less than ideal. The thing with your suggestion of using temp tables, is that I've heard Tableau doesn't play well with them (https://community.tableau.com/thread/144662)... I guess this is why I've seen some use of derived tables in the code? The article states that perhaps using stored procedures would be a good workaround? Sigh...it's times like this that I wish I earned an MS SQL cert, so that I'd be stronger in this subject matter...:ermm: Thank you for all of your suggestions! 🙂

  • daniness - Friday, December 1, 2017 8:01 AM

    BrainDonor - Friday, December 1, 2017 7:45 AM

    I haven't seen a query quite as spectacular as this one before but have come across similar creations in the past.
    It looks like something that has been abused over the years to cater for changing requirements, without anybody deciding to just stop and start again.
    If it was written that this from day one then the author needs to be checked for substance abuse. They also had no consideration for those that would follow in their footsteps and I'm willing to bet that there is no documentation.
    As time goes by this will take more time and resources to maintain, as it will never get simpler in its current format.

    The execution plan shows you that almost every table or index is being scanned, and the reasons behind this have already been mentioned, with an attempt to throw what appears to be quite complex logic all into one query. I have come across people that believe one massive query is more effective that a lot of little queries and they are generally wrong - at the very least it can be harder to maintain (as you are rather unfortunately discovering now) and that in itself becomes expensive.

    Your best course of action is to approach whoever you have to, to get the time and resources to re-write this. Tell them that some of the best people out there have looked at it and screamed - and some of them get paid a lot of money to fix problems.

    You need to know what it is trying to do, so you have the output from this query as a start. I assume the output is correct, it's just the time it takes that is the issue.

    It needs to be broken down into smaller steps, such as identifying where data could be manipulated in discrete steps, probably into temporary tables. These temporary tables could be built up to create data in the format required for a final query, which provides the final results. Breaking this down into such steps would make it a lot clearer and easier to maintain.

    Time and clear thinking is the only answer to this one. Our apologies for not being able to quickly fix the situation for you but this really is a bit of a monster.

    Hi Steve,

    Thank you for all your wisdom. I really appreciate it. I definitely do agree....this query is of behemoth proportion. I do believe that it's been tailored over time, depending on what the need was. The output is good, but it's the duration of time it takes to run which is less than ideal. The thing with your suggestion of using temp tables, is that I've heard Tableau doesn't play well with them (https://community.tableau.com/thread/144662)... I guess this is why I've seen some use of derived tables in the code? The article states that perhaps using stored procedures would be a good workaround? Sigh...it's times like this that I wish I earned an MS SQL cert, so that I'd be stronger in this subject matter...:ermm: Thank you for all of your suggestions! 🙂

    Not to break your bubble on earning an MS SQL Cert, but just getting a cert wouldn't really help you fix this query in and of itself.  Experience writing queries, starting with simple queries and working up to more complex queries is what really helps.  I have no certs myself and given time and an understanding of the data and the required results I could rewrite this query, as could many of us who have been working with SQL Server for a while.  This is just something that I don't think we could do in a forum environment.

    I would start by looking at the current result set and map the data back to the tables where the data resides.  Then the next step is to solve the puzzles that will turn the source into the destination.  That means understanding and documenting the requirements.  Definitely break the process down into simpler pieces as you work through the requirements.

  • Sergiy - Thursday, November 30, 2017 4:07 PM

    From somewhere inside the query:

      FROM [dbo].[FctCoverageLengthDetail]

    WHERE CASE

         WHEN([CoverageLengthMonthNumber]=13 or[CoverageLengthMonthNumber] =25 or[CoverageLengthMonthNumber] =37 or[CoverageLengthMonthNumber] =49

         or[CoverageLengthMonthNumber] =61 or[CoverageLengthMonthNumber] =73 or[CoverageLengthMonthNumber] =85 or[CoverageLengthMonthNumber] =97

         or[CoverageLengthMonthNumber] =109 or[CoverageLengthMonthNumber] =121 )

         AND[CoverageLengthMonthNumber] <(round((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2)+'/'+RIGHT([CoverageEffectiveDateKey],2)

          +'/'+LEFT([CoverageEffectiveDateKey],4)ASDATE),CAST(LEFT(RIGHT([CoverageEndDateKey],4),2)+'/'+RIGHT([CoverageEndDateKey],2)+'/'

          +LEFT([CoverageEndDateKey],4)ASDATE)))/(365/12),0))

         AND(round((DATEDIFF(DAY,CAST(LEFT(RIGHT([CoverageEffectiveDateKey],4),2)+'/'+RIGHT([CoverageEffectiveDateKey],2)+'/'

          +LEFT([CoverageEffectiveDateKey],4)ASDATE),CAST(LEFT(RIGHT([CoverageEndDateKey],4),2)+'/'+RIGHT([CoverageEndDateKey],2)+'/'

          +LEFT([CoverageEndDateKey],4)ASDATE)))/(365/12),0))

          -[CoverageLengthMonthNumber]<13

         THEN[CoverageLengthMonthNumber]

        ELSE 1

        END is not null

    To me it like the only way the expression may be NULL is when CoverageLengthMonthNumber is null.
    So, you can safely ditch the whole horrible CASE statement in WHERE clause and replace it with

    WHERE [CoverageLengthMonthNumber] is not null

    This is only one randomly picked point of improvement.

    Hi Sergiy,

    Thanks for your reply. It makes sense to me, as all the fields involved are setup as "not null". I tried your suggestion of replacing that long Where block with

    WHERE [CoverageLengthMonthNumber] is not null , but the query took around 32 minutes. Still looking at other possibilities...thanks for your feedback :).

  • As I said - it was only 1 randomly chosen piece of the query, picked only because I stopped scrolling through the query to take a sip of coffee.

    Every other piece of that query needs to be modified in a similar way.

    Not to mention -

    And you need to start from addressing dates.

    Are they stored as varchar?

    In what way are they used for records range selection?

    Are they indexed?

    Which one of the dates makes a clustered index?

    And so on.

    As Lynn said - figure out the logic of the query, and then redo it to implement that logic.

    _____________
    Code for TallyGenerator

  • This is probably applying a bandaid to a shotgun wound, but - have you considered considered creating persisted calculation columns for all of the repeated date operation you're doing on the fly?  There are repeating snippets of code that are converting strings to dates, and trying to manually perform date operations on the raw strings, none of which can be optimized where it is in the code.  Persisted columns CAN be included in indexes, which might help marginally.  Of course if we keep this as a single query, trying to swallow the entire her of elephants all at once, there's no shot at meaningfully helping this along.

    Also I don't know what type of insurance this proration routine is for, but it's doing some rather odd things from what I am used to seeing.  On that side - unless the insurance LOB you're in allow for out of sequence processing, a lot of the calculations you're performing would be set in stone for a given record.  Meaning the query seems to be reinventing the wheel each time you run it, for years of history that don't really change.

    Again - in no way does that contradict earlier feedback from everyone else - this ultimately needs to be gutted.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply