Query spills to tempdb with correct cardinality estimates

  • Hi,

    I was putting together some demo scripts to show bad query estimation however it appears that I came up with a situtaion where a spill happens despite correct estimates amd I am having a little difficulty in identifing why, could somebody please help me out please - the full script is below.

    basically the steps are

    create a database.

    create a table in that database.

    insert 1500 rows that vary only by the autonumber of the row.

    use a query to select all coumns - using a where clause that gets every one.

    estimated row count is 1500 - actual row count is 1500 yet the query spills to temp db.

    I've tried this on several servers with the same results - you can see the spill clearly in 2012 SSMS but I've include the output of sys.dm_io_virtual_file_stats both before and after the main query.

    Run step 1 in it's entirity and then step 2 - I'd appreciate any pointers.

    Cheers

    Martin.

    USE [tempdb]; /*Use a different database from the one we are about to drop*/

    GO

    --STEP 1 - sets up the database / table / data for our test

    --Drop the database if it exists

    IF EXISTS (SELECT * FROM [sys].[databases] [dbs] WHERE [dbs].[name] = 'StaleStatistics')

    BEGIN

    PRINT 'Dropping database [StaleStatistics] on server [' + @@SERVERNAME + ']';

    ALTER DATABASE [StaleStatistics] SET SINGLE_USER WITH ROLLBACK IMMEDIATE;

    DROP DATABASE [StaleStatistics];

    END;

    GO

    -- Create a new database

    CREATE DATABASE [StaleStatistics];

    GO

    USE [StaleStatistics];

    GO

    -- Create a test table

    CREATE TABLE Table1

    (

    FirstColumn INT IDENTITY(1,1) NOT NULL,

    SecondColumn INT NOT NULL,

    ThirdColumn NVARCHAR(2000) NOT NULL

    CONSTRAINT PK_TABLE1 PRIMARY KEY (FirstColumn)

    )

    GO

    -- Create a Non-Clustered Index on column Col2

    CREATE NONCLUSTERED INDEX IX_Table1_SecondColumn ON Table1(SecondColumn);

    GO

    /*Insert some data into the table*/

    INSERT INTO Table1 (SecondColumn, ThirdColumn)

    SELECT TOP 1500 222, REPLICATE('x', 2000)

    FROM master.dbo.syscolumns sc1;

    GO

    --STEP 2 - TURN ON ACTUAL EXECUTION PLAN FOR STEP 2

    -- Check TempDb activity before we execute the sort operation.

    -- We're the only one's using the system so nobody else is affecting our results.

    SELECT num_of_writes, num_of_bytes_written FROM

    sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

    GO

    /*

    SQL SERVER gets the correct cardinality estimate - 1500

    yet we still get a spill to tempdb -- why???

    */

    SELECT * FROM Table1

    WHERE SecondColumn = 222

    ORDER BY ThirdColumn

    GO

    -- Check TempDb activity after we execute the sort operation.

    -- We're the only one's using the system so nobody else is affecting our results.

    SELECT num_of_writes, num_of_bytes_written FROM

    sys.dm_io_virtual_file_stats(DB_ID('tempdb'), 1)

    GO

  • Your tempdb usage has nothing to do with cardinality estimates. It is because you are ordering by a NVARCHAR(2000) column and the sort operation is spilling to tempdb. Try the query without the order by and you will see that it doesn't spill.

    Edit to add:

    Even changing the column to VARCHAR dropped the memory grant by 1/2 and my laptop was able to do the sort in memory without a spill.

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

  • Hi Sean,

    yep, your entirely correct, the sort is responsible for the spill to disk.

    And thanks for pointing out that changing the column to a VARCHAR eliminates the spill.

    I also found that leaving the column as an NVARCHAR and only placing 970 records in that table does not result in a spill eithet.

    Maybe I should re-phase my question

    Why can't SQL SERVER put aside enough memory for the sort and avoid the spill to disk, there seems to be an upper limit the memory that can be granted to a query.

    Cheers

    Martin.

  • martin catherall (10/30/2013)


    Why can't SQL SERVER put aside enough memory for the sort and avoid the spill to disk, there seems to be an upper limit the memory that can be granted to a query.

    Hold on a minute... using TempDB is not automatically a disk function. Queries that spill to TempDB will first use memory and only resort to actual disk usage if they get too big or the optimizer thinks they'll get to big to fit into the memory allocated for the task according to optimizer estimates.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • fair point Jeff, I used the term "spill to disk" to mean "spill to tempdb" - that was an error on my part.

    Let me re-phrase again 🙂

    Why can't SQL SERVER put aside enough memory for the sort and avoid the spill to tempdb,

    there seems to be an upper limit the memory that can be granted to a query.

    Thanks for correcting me.

    Cheers

    Martin.

  • martin catherall (10/30/2013)


    fair point Jeff, I used the term "spill to disk" to mean "spill to tempdb" - that was an error on my part.

    Let me re-phrase again 🙂

    Why can't SQL SERVER put aside enough memory for the sort and avoid the spill to tempdb,

    there seems to be an upper limit the memory that can be granted to a query.

    Thanks for correcting me.

    Cheers

    Martin.

    Someone even a little bit smarter than me on internals could do a much better job of explaining. I guess it's "by definition" that the sorts are designed to use TempDB so that SQL Server doesn't have to pre-allocate memory to that part of the query.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/30/2013)


    martin catherall (10/30/2013)


    fair point Jeff, I used the term "spill to disk" to mean "spill to tempdb" - that was an error on my part.

    Let me re-phrase again 🙂

    Why can't SQL SERVER put aside enough memory for the sort and avoid the spill to tempdb,

    there seems to be an upper limit the memory that can be granted to a query.

    Thanks for correcting me.

    Cheers

    Martin.

    Someone even a little bit smarter than me on internals could do a much better job of explaining. I guess it's "by definition" that the sorts are designed to use TempDB so that SQL Server doesn't have to pre-allocate memory to that part of the query.

    I like the posts from Paul White and Joe Sack

    Paul White - Advanced TSQL Tuning: Why Internals Knowledge Matters

    Joe Sack - Memory Grant Execution Plan Statistics[/url]

    The SQL Guy @ blogspot[/url]

    @SeanPearceSQL

    About Me[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply