TEMPDB LOG explosive growth

  • twice in the last week the tempdb LOG has filled it's disk. i've never had an issue with the LOG file for tempdb, it's always been sized quite small for all of our servers. we recently had it go from 10GB to 30GB (filled the disk). we extended the disk to 60GB, and today it filled it again. we have identified the offending application (it's on a shared server), killed the job, and the job will not be run again until the developers can debug what is going on. at the moment, the temp log file (82GB) larger than tempdb data file (72GB).

    i'm just wondering - is there any bug or a special circumstance that would cause the tempdb LOG would be used so heavily? it's always been small and lightly used, even on our largest and most heavily used servers.

    thanks!

  • Quick question, any MSDTC transactions?

    😎

  • nope, msdtc not running on this server.

  • JarJar (7/18/2016)


    twice in the last week the tempdb LOG has filled it's disk. i've never had an issue with the LOG file for tempdb, it's always been sized quite small for all of our servers. we recently had it go from 10GB to 30GB (filled the disk). we extended the disk to 60GB, and today it filled it again. we have identified the offending application (it's on a shared server), killed the job, and the job will not be run again until the developers can debug what is going on. at the moment, the temp log file (82GB) larger than tempdb data file (72GB).

    i'm just wondering - is there any bug or a special circumstance that would cause the tempdb LOG would be used so heavily? it's always been small and lightly used, even on our largest and most heavily used servers.

    thanks!

    In most such cases, you'll find that you have one or more queries that have "accidental cross-joins" in the form of many-to-many joins and is normally caused by a misunderstanding of the data leading to insufficient join criteria. You'll need to find those and fix them. Once found and if the design of the database doesn't support something more logical, then you'll need to use "Divide'n'Conquer" methods to break the query up, possibly storing 1 or more interim results in a Temp Table which, ironically, will reduce the Temp Table usage.

    Sometimes such queries are born of the idea of a "load everything now" mentality (I think they're called "Aggressive Loads"). They usually come in the form of large queries with lots of joins. I've seen up to 40 joins and I know others that have seen 80.

    Some of these types of queries are pretty easy to find... just look for the ones that have DISTINCT in them. 😉

    The most effective thing that you could do is help the developers achieve their goal without crushing the server. Help them rewrite the code and do it without arrogance. They'll love you for it. As David Poole once said about being an exceptional DBA, "If you're the first person people seek out for database problems rather than the last, you might be an exceptional DBA". I sit smack dab in the middle of the Developer to make that an easy thing for them to do and it has worked wonders for them and the code. In the long run, it has also worked wonders for me because there are much fewer (like none) system slowdowns that affect the end users and the SLA screen return times haven't just been met but have been seriously beaten.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Jeff Moden (7/18/2016)


    JarJar (7/18/2016)


    twice in the last week the tempdb LOG has filled it's disk. i've never had an issue with the LOG file for tempdb, it's always been sized quite small for all of our servers. we recently had it go from 10GB to 30GB (filled the disk). we extended the disk to 60GB, and today it filled it again. we have identified the offending application (it's on a shared server), killed the job, and the job will not be run again until the developers can debug what is going on. at the moment, the temp log file (82GB) larger than tempdb data file (72GB).

    i'm just wondering - is there any bug or a special circumstance that would cause the tempdb LOG would be used so heavily? it's always been small and lightly used, even on our largest and most heavily used servers.

    thanks!

    In most such cases, you'll find that you have one or more queries that have "accidental cross-joins" in the form of many-to-many joins and is normally caused by a misunderstanding of the data leading to insufficient join criteria. You'll need to find those and fix them. Once found and if the design of the database doesn't support something more logical, then you'll need to use "Divide'n'Conquer" methods to break the query up, possibly storing 1 or more interim results in a Temp Table which, ironically, will reduce the Temp Table usage.

    Sometimes such queries are born of the idea of a "load everything now" mentality (I think they're called "Aggressive Loads"). They usually come in the form of large queries with lots of joins. I've seen up to 40 joins and I know others that have seen 80.

    Some of these types of queries are pretty easy to find... just look for the ones that have DISTINCT in them. 😉

    The most effective thing that you could do is help the developers achieve their goal without crushing the server. Help them rewrite the code and do it without arrogance. They'll love you for it. As David Poole once said about being an exceptional DBA, "If you're the first person people seek out for database problems rather than the last, you might be an exceptional DBA". I sit smack dab in the middle of the Developer to make that an easy thing for them to do and it has worked wonders for them and the code. In the long run, it has also worked wonders for me because there are much fewer (like none) system slowdowns that affect the end users and the SLA screen return times haven't just been met but have been seriously beaten.

    that's great information. thanks, jeff.

    exactly that. there is apparently a cartesian join buried in the bowels of some SAP ETL loading process. even the developers aren't sure where to get down to it. working on that issue now.

  • Sounds like they're on the right track. Thank you for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Hash matched joins between very large tables will also do it. I once had to troubleshoot an ETL process that would run for days until eventually filling a 1 TB tempdb drive.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • thanks again guys. the developers have found a workaround for the cross join. we're going to give it another try.

    but there is something strange going on right now - tempdb data (72GB) is 99.96 free, but log (100GB) is 30% used. i was waiting for the system to wind this down but seems like it's something else. sys queries return nothing using tempdb. the log size is slowly increasing, just a tiny bit every 10 minutes or so...29.6, 29.8, 30.0.

    what could be going on here?

  • What do you get if you run this query?

    SELECT log_reuse_wait_desc

    FROM sys.databases

    WHERE name = 'tempdb'

    John

  • i looked at that.....

    NOTHING

  • Log in tempdb behave like log for db in simple recovery mode. Checkpoint fire when log is in 70% full and then clear the log. You can execute manually checkpoint statement on tempdb and it will clear the log.

    http://www.sqlskills.com/blogs/paul/what-does-checkpoint-do-for-tempdb/[/url]

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply