SQL Server TempDB ballooning to more than 75 GB

  • Hi,

    In one of our database while executing a procedure serving an important business logic, which has 11 joins, then a filtering on a specific column, sorting using order by on a specific column. In the join condition, as I see them separately, there are millions of rows fetched. Now I see that TempDB is ballooning to 75 GB size, which is the available disk space and the query fails.

    What could be the possible reason:

    - Is the TempDB, behaving incorrectly, can I really expect that kind of size (75 GB).

    - If the TempDB behavior is correct, is there something I can do to mitigate the situation, will the faster execution of the query, having more statistics, relevant index, more seeking than scanning of index / table, will that help is solving the situation.

    I know a possible situation would be relocate the db to a much a disk space, but I want to figure out the tuning options first, since I do not know, what is the maximum size TempDb will bloat up to

    Any relevant suggestion would be great.

    thanks,

    Mrinal

  • mrinal.technology (7/30/2013)


    Hi,

    In one of our database while executing a procedure serving an important business logic, which has 11 joins, then a filtering on a specific column, sorting using order by on a specific column. In the join condition, as I see them separately, there are millions of rows fetched. Now I see that TempDB is ballooning to 75 GB size, which is the available disk space and the query fails.

    What could be the possible reason:

    - Is the TempDB, behaving incorrectly, can I really expect that kind of size (75 GB).

    - If the TempDB behavior is correct, is there something I can do to mitigate the situation, will the faster execution of the query, having more statistics, relevant index, more seeking than scanning of index / table, will that help is solving the situation.

    I know a possible situation would be relocate the db to a much a disk space, but I want to figure out the tuning options first, since I do not know, what is the maximum size TempDb will bloat up to

    Any relevant suggestion would be great.

    thanks,

    Mrinal

    The usual cause of this is an inadvertent Cartesian product, where you don't have one-to-one joins but many-many join that blows the row counts through the roof (billions+ total rows are easy to hit with relatively few rows in each table).

    LOTS of things use tempdb, and you probably have quite a few in this query. Sorting, HASH joins, aggregates, table spools, interim work tables, etc.

    I would also look for bad cardinality estimates, ensure stats are up to date, etc. SO many things could be in play here.

    Oh, I would also STRONGLY recommend that you hire a performance tuning professional for a bit of help (and mentoring). WIN(company gets better performance)-WIN(you learn about performance tuning/refactoring)-WIN(consultant gets some money).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Kevin, we would be looking forward for the performance tuning sometime in the future once we have all the database components / schema in place, right now I am focusing on a query.

    Following is the query for the reference:

    ELECT

    S.SHIPMENTID,

    LMapped.LANEID AS ServID,

    LOrig.LANEID AS OrigServID,

    SCOrig.DOWID AS DepDay,

    DATEPART(HOUR, S.DEPCUTOFFTIME) * 60 + DATEPART(MI, S.DEPCUTOFFTIME) AS DepTime,

    SCOrig.DAYSINROUTE AS DIR,

    (((SCOrig.DOWID + SCOrig.DAYSINROUTE - 1) % 7) + 1) AS ArrDay,

    DATEPART(HOUR, S.ARRCUTOFFTIME) * 60 + DATEPART(MI, S.ARRCUTOFFTIME) AS ArrTime,

    S.TOTALWEIGHT AS ComSWeight,

    ceiling(DATEDIFF(MI, S.DEPCUTOFFTIME, DATEADD(MI, (OT.TIMEOFFSET - DT.TIMEOFFSET) * 60 + SCOrig.DAYSINROUTE * 1440, S.ARRCUTOFFTIME))) AS ComSMinute

    FROM

    dbo.PLANSHIPMENT AS S

    JOIN dbo.SERVICECOMMITMENT AS SCOrig

    ON SCOrig.COMMITMENTID = S.COMMITMENTID AND SCOrig.SCENARIOID = S.SCENARIOID

    JOIN dbo.LANE AS LOrig

    ON LOrig.LANEID = SCOrig.LANEID AND LOrig.SCENARIOID = SCOrig.SCENARIOID

    JOIN dbo.PARTNERGATEWAY AS GW

    ON

    GW.ORIGTERMINALID = LOrig.ORIGTERMINALID AND

    GW.DESTTERMINALID = LOrig.DESTTERMINALID AND

    GW.SCENARIOID = LOrig.SCENARIOID

    JOIN dbo.LANE AS LMapped

    ON

    LMapped.ORIGTERMINALID = GW.ORIGGWTERMINALID AND

    LMapped.DESTTERMINALID = GW.DESTGWTERMINALID AND

    LMapped.SCENARIOID = GW.SCENARIOID

    JOIN dbo.SERVICECOMMITMENT AS SCMapped

    ON

    SCMapped.LANEID = LMapped.LANEID AND

    SCMapped.DOWID = SCOrig.DOWID AND

    SCMapped.SCENARIOID = LMapped.SCENARIOID

    JOIN dbo.TERMINAL AS MappedOrig

    ON MappedOrig.NODEID = GW.ORIGGWTERMINALID AND MappedOrig.SCENARIOID =

    GW.SCENARIOID

    JOIN dbo.TERMINAL AS MappedDest

    ON MappedDest.NODEID = GW.DESTGWTERMINALID AND MappedDest.SCENARIOID =

    GW.SCENARIOID

    JOIN dbo.NODE AS Orig

    ON Orig.NODEID = MappedOrig.NODEID AND Orig.SCENARIOID = MappedOrig.SCENARIOID

    JOIN dbo.NODE AS Dest

    ON Dest.NODEID = MappedDest.NODEID AND Dest.SCENARIOID = MappedDest.SCENARIOID

    JOIN dbo.TIMEZONE AS OT

    ON Orig.TIMEZONEID = OT.TIMEZONEID

    JOIN dbo.TIMEZONE AS DT

    ON Dest.TIMEZONEID = DT.TIMEZONEID

    WHERE S.SCENARIOID = @p_scenarioid // Integer

    ORDER BY S.SHIPMENTID

    Also as per my analysis all tables are providing data via Cartesian product, tables are independent of each other, which I assume could be the reason, that's why such a huge data is processed in memory. Is my analysis correct. Also looking at the estimated execution plan, a clustered index scan takes 45% time, non clustered index seek takes 35% time and a nested join takes 16% time, but for the clustered index scan it doesn't specify the exact table, can it be on an intermediary table generated, which is my guess

  • Please provide the actual execution plan with this query so we can offer better advice.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • It doesn't look like you have cartesian joins. You have defined join criteria. A cartesian is when you don't have join criteria defined and every row is joined on every row.

    Instead, guessing since I can't see the execution plan, you're returning, as you stated, millions of rows. You have an order by statement. For millions of rows, especially if there is no index to help with ordering, it's pretty likely that this is what is causing your ballooning. Why are you returning millions of rows? Is this a report of some sort? I'd suggest getting a better filtering mechanism in place to ensure fewer rows returned. That will help all by itself. If we're talking about a data load process, then you may want to look at other mechanisms.

    All guesses though without seeing the execution plan.

    Oh, and one other thing, those costs in an execution plan are just estimated values, not necessarily accurately reflecting real execution metrics.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Following is the estimated SQL Plan:

    https://www.dropbox.com/s/f69nsh46eij3ht0/LoadComExecutionPlan.sqlplan

    As of now I think we are able to reach the crux of the issue, this db was ported from Oracle and we noticed that there are lot of number fields, which have been converted from Number type in Oracle to Number(38) in SQL Server, when they could have been int or even tinyint, after making those changes we are able to execute the query much faster and it does not bloat the tempdb

  • So much for my stupid Cartesian product statement. You do have that problem. First warning right at the top of the plan indicates you don't have JOIN criteria. Apologies Kevin.

    I'd fix that right away.

    And, you have another warning, one of your tables has no statistics on columns. Do you have auto create/auto update stats turned off? Unless you have a very well tested and documented reason, I'd turn them back on. If you do, then you need to create one set of stats on the column indicated, again, another warning.

    Also, this query did not go through a full set of optimization. Instead it timed out in the optimizer. That means the plan you have could be unstable. You might want to look to see if there is any way to simplify the query. Although, fixing the missing statistics and the missing join criteria could help.

    You say this is returning millions of rows? The optimizer thinks, based on statistics, that it's only returning 2. Again, look to see if you have auto update stats turned off or if you need to run sp_updatestats or even UPDATE STATISTICS ... WITH FULL SCAN on some of the tables. If your statistics are off, you will get very poor performing queries.

    That's where I'd start. Later, you might want to look to possibly address the key lookup and other potential issues.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks Kevin, we would be looking forward for the performance tuning sometime in the future once we have all the database components / schema in place, right now I am focusing on a query.

    That is honestly a REALLY, REALLY bad position to take!!! The earlier in the process you "start doing things right" the:

    1) easier it is to fix things

    2) less risky it is to fix things

    3) less time it takes to fix things (i.e. COST to pay the consultant)

    4) easier it is to FIND actual root causes of issues in the first place

    5) etc.

    SOOOO many times I have gone to new clients that were TOTALLY FUBAR'D and it took WAY more effort to find/fix problems than it should have. And honestly it often happens that broken stuff simply CANNOT be fixed because it is either too risky or the required downtime is unacceptable.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • One more thing. If statistics staleness aren't in play here as Grant suggests, the other possibilities are:

    a) data value skew, which could be helped by OPTION (RECOMPILE) on the statement

    b) you simply have too much joined together. Beasts like this very often benefit from one or sometimes multiple temporary tables (NOT table variables!!) that get interim result sets that allow the optimizer to get better overall estimates for what is REALLY gonna happen and thus avoid DISASTROUSLY bad query plans.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Grant Fritchey (7/31/2013)


    So much for my stupid Cartesian product statement. You do have that problem. First warning right at the top of the plan indicates you don't have JOIN criteria.

    I am not so sure. I think the optimizer produces that cross join on its own initiative.

    Note that the scan on is on an indexed view with the name _dta_mv_20 which seems like something that DTA has suggested.

    It would help to see the actual execution plan, as that can reveal bad estimates.

    Then again, things improved when they replaced all numeric(38) with better data types, although it is not clear to me if the result was good enough.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (7/31/2013)


    Grant Fritchey (7/31/2013)


    So much for my stupid Cartesian product statement. You do have that problem. First warning right at the top of the plan indicates you don't have JOIN criteria.

    I am not so sure. I think the optimizer produces that cross join on its own initiative.

    Note that the scan on is on an indexed view with the name _dta_mv_20 which seems like something that DTA has suggested.

    It would help to see the actual execution plan, as that can reveal bad estimates.

    Then again, things improved when they replaced all numeric(38) with better data types, although it is not clear to me if the result was good enough.

    Yeah, you could be right. There's no indication of that object within the query. There are a bunch of other DTA indexes. It looks like the DTA was run and all it's suggestions were implemented. Is it possible the view being referenced just has a cross join inside it?

    As with so many other things, I think we're back to statistics. And possibly bad DTA recommendations.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply