Huge Latency in TempDB

  • Hi All,

     We are facing a huge latency in tempdb, what will be the reason???

    Latency
    1174
    1174
    1166
    1173

  • Narine M - Tuesday, January 30, 2018 3:34 AM

    Hi All,

     We are facing a huge latency in tempdb, what will be the reason???

    Latency
    1174
    1174
    1166
    1173

    With only that small amount of information, it could be anything including but not limited to serious garbage code (the most frequent reason, IMHO and personal observation), a physically challenged disk, a bad network card, poor cabling, wrong settings just about anywhere in the chain between the server and the disk, some bad memory (especially in any of the caches), some server settings, poor growth settings that create way too many VLFs in that same log file, etc, etc.

    On the subject of garbage code, remember that poor disk latency is symptom, not a cause.  If you have garbage code that beats the hell out of TempDB, it can pin disk latency to the ceiling.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • As Jeff noted, it could be because of hardware or poor code. If you benchmark the disk, is it slower than others?

    Are there certain days/times/functions that cause you to notice this? Likely it's bad code that either spills to tempdb because of too many reads or someone really loves temp tables and is overusing them.

  • My experience (~25 years as a SQL Server only consultant) shows that suboptimal IO is the cause of this. You can also add in suboptimal virtual machine configuration these days too.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Have you looked at your wait stats?  Although you haven't provided any detail I can guess that such high latency could be caused by too much IO read:  not enough spindles and/or datafiles in the database for the amount of requests that need ot be serviced.

    How is your TempDB set up?

  • kevaburg - Friday, February 2, 2018 5:06 AM

    Have you looked at your wait stats?  Although you haven't provided any detail I can guess that such high latency could be caused by too much IO read:  not enough spindles and/or datafiles in the database for the amount of requests that need ot be serviced.

    How is your TempDB set up?

    Heh... or some seriously performance challenged code with non-SARGable predicates and insufficient criteria that cause accidental many-to-many joins.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Friday, February 2, 2018 6:36 AM

    kevaburg - Friday, February 2, 2018 5:06 AM

    Have you looked at your wait stats?  Although you haven't provided any detail I can guess that such high latency could be caused by too much IO read:  not enough spindles and/or datafiles in the database for the amount of requests that need ot be serviced.

    How is your TempDB set up?

    Heh... or some seriously performance challenged code with non-SARGable predicates and insufficient criteria that cause accidental many-to-many joins.

    We had some weekly issue with tempdb bottleneck using an SSD drive with 4 files and a log file. We split the SSD mirror into two and place two tempdb files on each drive, and moved tempdb log file to a 3rd drive. Problem no more.

  • RandomStream - Friday, February 2, 2018 5:37 PM

    Jeff Moden - Friday, February 2, 2018 6:36 AM

    kevaburg - Friday, February 2, 2018 5:06 AM

    Have you looked at your wait stats?  Although you haven't provided any detail I can guess that such high latency could be caused by too much IO read:  not enough spindles and/or datafiles in the database for the amount of requests that need ot be serviced.

    How is your TempDB set up?

    Heh... or some seriously performance challenged code with non-SARGable predicates and insufficient criteria that cause accidental many-to-many joins.

    We had some weekly issue with tempdb bottleneck using an SSD drive with 4 files and a log file. We split the SSD mirror into two and place two tempdb files on each drive, and moved tempdb log file to a 3rd drive. Problem no more.

    IMHO, you didn't actually fix the problem.  You just split the problem into two.  Stop and think about it... your TempDB is on SSDs and you still had contention?  While your patch on the problem certainly got you out of the woods in the short term, it doesn't solve future problems that will occur due to the eventual large increases in scale nor does solve the eventual problem of what you'll need to do when similar contentious code makes it's appearance.  That's some nasty technical debt that has been built and remporarily solved with hardware and someone someday is going to need to bite the bullet and fix the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, February 4, 2018 9:53 AM

    RandomStream - Friday, February 2, 2018 5:37 PM

    Jeff Moden - Friday, February 2, 2018 6:36 AM

    kevaburg - Friday, February 2, 2018 5:06 AM

    Have you looked at your wait stats?  Although you haven't provided any detail I can guess that such high latency could be caused by too much IO read:  not enough spindles and/or datafiles in the database for the amount of requests that need ot be serviced.

    How is your TempDB set up?

    Heh... or some seriously performance challenged code with non-SARGable predicates and insufficient criteria that cause accidental many-to-many joins.

    We had some weekly issue with tempdb bottleneck using an SSD drive with 4 files and a log file. We split the SSD mirror into two and place two tempdb files on each drive, and moved tempdb log file to a 3rd drive. Problem no more.

    IMHO, you didn't actually fix the problem.  You just split the problem into two.  Stop and think about it... your TempDB is on SSDs and you still had contention?  While your patch on the problem certainly got you out of the woods in the short term, it doesn't solve future problems that will occur due to the eventual large increases in scale nor does solve the eventual problem of what you'll need to do when similar contentious code makes it's appearance.  That's some nasty technical debt that has been built and remporarily solved with hardware and someone someday is going to need to bite the bullet and fix the code.

    Thank, Jeff. You do have a valid point. In this particular DW database, no one is allowed to create tables. So all SP's will need to use temp tables to do data crunching. On top of that, we have to rely on vendor's supplied SP's to meet business needs. There is very little room for in-house dev team to improve the processes. I will certain keep your advice in mind.

  • RandomStream - Sunday, February 4, 2018 11:36 AM

    Jeff Moden - Sunday, February 4, 2018 9:53 AM

    RandomStream - Friday, February 2, 2018 5:37 PM

    Jeff Moden - Friday, February 2, 2018 6:36 AM

    kevaburg - Friday, February 2, 2018 5:06 AM

    Have you looked at your wait stats?  Although you haven't provided any detail I can guess that such high latency could be caused by too much IO read:  not enough spindles and/or datafiles in the database for the amount of requests that need ot be serviced.

    How is your TempDB set up?

    Heh... or some seriously performance challenged code with non-SARGable predicates and insufficient criteria that cause accidental many-to-many joins.

    We had some weekly issue with tempdb bottleneck using an SSD drive with 4 files and a log file. We split the SSD mirror into two and place two tempdb files on each drive, and moved tempdb log file to a 3rd drive. Problem no more.

    IMHO, you didn't actually fix the problem.  You just split the problem into two.  Stop and think about it... your TempDB is on SSDs and you still had contention?  While your patch on the problem certainly got you out of the woods in the short term, it doesn't solve future problems that will occur due to the eventual large increases in scale nor does solve the eventual problem of what you'll need to do when similar contentious code makes it's appearance.  That's some nasty technical debt that has been built and remporarily solved with hardware and someone someday is going to need to bite the bullet and fix the code.

    Thank, Jeff. You do have a valid point. In this particular DW database, no one is allowed to create tables. So all SP's will need to use temp tables to do data crunching. On top of that, we have to rely on vendor's supplied SP's to meet business needs. There is very little room for in-house dev team to improve the processes. I will certain keep your advice in mind.

    Thanks for the feedback.  Yeah... I agree.  3rd party stuff can be a real pain.  Going through that now.  And, your patch for the problem their code was causing wasn't inappropriate at all... it got things working.  It's still worth throwing a hammer at the vendor to get them to fix their junk.  Either that or write new code to replace the vendor and their code if they refuse to help out.

    And, yeah, I know I'm definitely preaching to the choir on both of those suggestions.  I just had to say it out loud for the benefit of those that are stuck in similar situations.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Sunday, February 4, 2018 12:22 PM

    Thanks for the feedback.  Yeah... I agree.  3rd party stuff can be a real pain.  Going through that now.  And, your patch for the problem their code was causing wasn't inappropriate at all... it got things working.  It's still worth throwing a hammer at the vendor to get them to fix their junk.  Either that or write new code to replace the vendor and their code if they refuse to help out.

    And, yeah, I know I'm definitely preaching to the choir on both of those suggestions.  I just had to say it out loud for the benefit of those that are stuck in similar situations.

    Unless you are the 800-pound Gorilla Client for a given ISV there is little hope of getting them to listen to even the most irrefutable advice/recommendations. I have had some MAJOR wins though when I consulted for said 800-lb clients though. It is amazing to have a client on the phone and after you explain there is a magic bullet to fix ISV code say "hold on a sec .... and less than 2 minutes later have the CEO and VP of Development of the ISV ON THE PHONE and be told to 'listen to what this guy says and then do it'"!!! And then see a patch released to them 7 days later that reduces waits on the server by 80%. 😎

    If you are a regular schmo client you still have options, although most break "supportability": server configuration (and proper sizing) [usually doesn't break support], adding/removing indexes, plan guides, actually rewriting code.

    I have done ALL of those things at MANY clients over the years, to AMAZING effect, and never had knowledge of breaking an app or causing bad data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru - Monday, February 5, 2018 8:09 AM

    Jeff Moden - Sunday, February 4, 2018 12:22 PM

    Thanks for the feedback.  Yeah... I agree.  3rd party stuff can be a real pain.  Going through that now.  And, your patch for the problem their code was causing wasn't inappropriate at all... it got things working.  It's still worth throwing a hammer at the vendor to get them to fix their junk.  Either that or write new code to replace the vendor and their code if they refuse to help out.

    And, yeah, I know I'm definitely preaching to the choir on both of those suggestions.  I just had to say it out loud for the benefit of those that are stuck in similar situations.

    Unless you are the 800-pound Gorilla Client for a given ISV there is little hope of getting them to listen to even the most irrefutable advice/recommendations. I have had some MAJOR wins though when I consulted for said 800-lb clients though. It is amazing to have a client on the phone and after you explain there is a magic bullet to fix ISV code say "hold on a sec .... and less than 2 minutes later have the CEO and VP of Development of the ISV ON THE PHONE and be told to 'listen to what this guy says and then do it'"!!! And then see a patch released to them 7 days later that reduces waits on the server by 80%. 😎

    If you are a regular schmo client you still have options, although most break "supportability": server configuration (and proper sizing) [usually doesn't break support], adding/removing indexes, plan guides, actually rewriting code.

    I have done ALL of those things at MANY clients over the years, to AMAZING effect, and never had knowledge of breaking an app or causing bad data.

    Yep.  I've also seen folks add a single index and get crushed because it cost the company a whole lot of dollars to re-establish the maintenance contract even as the 800 pound client.  It also cost the consultants dearly in both money and reputation.  Be careful, ol' friend.  Especially nowadays.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • Jeff Moden - Monday, February 5, 2018 2:36 PM

    TheSQLGuru - Monday, February 5, 2018 8:09 AM

    Jeff Moden - Sunday, February 4, 2018 12:22 PM

    Thanks for the feedback.  Yeah... I agree.  3rd party stuff can be a real pain.  Going through that now.  And, your patch for the problem their code was causing wasn't inappropriate at all... it got things working.  It's still worth throwing a hammer at the vendor to get them to fix their junk.  Either that or write new code to replace the vendor and their code if they refuse to help out.

    And, yeah, I know I'm definitely preaching to the choir on both of those suggestions.  I just had to say it out loud for the benefit of those that are stuck in similar situations.

    Unless you are the 800-pound Gorilla Client for a given ISV there is little hope of getting them to listen to even the most irrefutable advice/recommendations. I have had some MAJOR wins though when I consulted for said 800-lb clients though. It is amazing to have a client on the phone and after you explain there is a magic bullet to fix ISV code say "hold on a sec .... and less than 2 minutes later have the CEO and VP of Development of the ISV ON THE PHONE and be told to 'listen to what this guy says and then do it'"!!! And then see a patch released to them 7 days later that reduces waits on the server by 80%. 😎

    If you are a regular schmo client you still have options, although most break "supportability": server configuration (and proper sizing) [usually doesn't break support], adding/removing indexes, plan guides, actually rewriting code.

    I have done ALL of those things at MANY clients over the years, to AMAZING effect, and never had knowledge of breaking an app or causing bad data.

    Yep.  I've also seen folks add a single index and get crushed because it cost the company a whole lot of dollars to re-establish the maintenance contract even as the 800 pound client.  It also cost the consultants dearly in both money and reputation.  Be careful, ol' friend.  Especially nowadays.

    Every CREATE INDEX script comes along with a DROP INDEX script. You need support, need to apply a patch, etc. you run the DROP first, do what you need to to, then recreate when done.

    I note that I usually only get called in when sh-t is SOOO BAD it is borderline or actually non-functional (or quickly becoming so). A handful key indexes can make that go away. 

    I do make sure to give a FULL disclaimer before putting this out to clients though, and let them make the decision as to whether or not to pursue it. And obviously if the ISV has any form of direct access to the database (which I have seen more times than I would like to think about!!) then it is a no-go for sure.

    Thanks for reiterating the risks though Jeff!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply