delete top(8000) takes 1 second, delete top(9000) takes 42 seconds !!

  • Hi,

    Delete top(8000) takes 1 second, but delete top(9000) takes 42 seconds, any ideas why?

    In management studio, if I do the following:

    --delete any rows older than 48hrs in batches of 8000 rows

    DECLARE @MaxBuffer DateTime

    SELECT @MaxBuffer = MAX(DateTime) FROM dbo.mytable

    DELETE

    top(8000)

    FROM dbo.mytable

    FROM dbo.mytable

    WHERE DATEDIFF(hour, [DateTime], @MaxBuffer) > 48

    it takes 1 second.

    If I do the following, making the batch 9000:

    --delete any rows older than 48hrs in batches of 9000 rows

    DECLARE @MaxBuffer DateTime

    SELECT @MaxBuffer = MAX(DateTime) FROM dbo.mytable

    DELETE

    top(9000)

    FROM dbo.mytable

    FROM dbo.mytable

    WHERE DATEDIFF(hour, [DateTime], @MaxBuffer) > 48

    it takes 42 seconds.

    In this case I notice the execution plans (estimated and actual) contain a parallelism node - it is only there if top(9000) is used, not when top(8000) is used.

    If I then do the following, in the 9000 case adding "option (maxdop 1)"

    --delete any rows older than 48hrs in batches of 9000 rows

    DECLARE @MaxBuffer DateTime

    SELECT @MaxBuffer = MAX(DateTime) FROM dbo.mytable

    DELETE

    top(9000)

    FROM dbo.mytable

    FROM dbo.mytable

    WHERE DATEDIFF(hour, [DateTime], @MaxBuffer) > 48

    option (maxdop 1)

    it takes 1 second.

    Any idea what causes the parallelism node to be added? It's affect seems to be to read all 13 million rows in the table in the non-clustered index scan using all 8 processors, rather than just read 9000 rows with 1 processor in the maxdop 1 case...

    Any hints or tips appreciated.

    Regards,

    Dave

  • The cost threshold must have breached the value which you have it set at. Typically this is set to 5 by default, so top 8000 must have a cost threshold < 5 and top 9000 > 5 which is why it creates a parallel plan.

    If you cant change the maxdop setting using sp_configure, then the only thing is as you have done and use the OPTION (MAXDOP 1) clause if your query.

  • I wonder if the plan is EXACTLY the same between the parallel and non-parallel executions. I would imagine that one is doing an index seek and bookmark lookup for the delete and the other is doing a scan to get the rows to delete. Simple costing algorythm that in this case gets you a plan that "costs" less but runs slower.

    I will add that you should NEVER wrap a column around a function in a WHERE clause if you can help it. And in this case you can easily rewrite your WHERE clause to avoid that function. That can allow the optimizer to get MUCH better statistics out of the WHERE clause and makes it SARGable, which can lead to DRAMATICALLY better plans in many cases.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • This is not a precise answer to your question but it might be worth considering.

    I do a lot of reading and research on SQL Server, et al. I remember reading that there is a threshold for row locking that will escalate to a table lock. I remember reading that it was at about 5000 row locks. I don't know if that is correct, only that I remember reading it.

    When I perform a massive DELETE operation, I always perform it in a loop that deletes the TOP 4000 rows at a time. This technique has proved to be low impact in a high volume, production, OLTP environment. I usually accompany each iteration of the loop with a small delay (WAITFOR).

    Jeff Moden once posted a different technique. He wrote his SQL code to obtain a table lock, then deleted a large number of rows at a time. I asked him about it and he said he thought it was more efficient than obtaining several thousand row locks.

    You might want to experiment with both techniques.

  • Many thanks guys for all the replies and suggestions.

    I have forced the serial plan for this one query using maxdop1.

    I did also out of interest temporarily try upping the cost threshold from 5 to 6, and that also gave a non-parallel plan.

    Now that I have solved the focing it to serial, for this particular case I guess I was also interested as to why a parallel operation should take longer than a serial one.

    To Gail, thanks for the loop of 4000. Yes I use that one all the time. This question came out of my very experiments of upping the top(4000) to top(100,000) and seeing the various results of that...

    To Kevin Boles, thanks for the WHERE clause re-write tip. Can you give me a simple re-write example so I can make sure I understand you properly? Do you mean something like this:

    WHERE [DateTime] < @MaxBuffer_plus48

    (The code is running in production - which I inherited. But I will be able to change it if it makes it run better.)

    Many thanks again.

    Dave.

  • WHERE [DateTime] < DATEADD(hour, 48, @MaxBuffer), or something similar to that. Could be >, didn't dig too deeply into it.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • david.moule (12/12/2012)


    The code is running in production - which I inherited. But I will be able to change it if it makes it run better.

    three things you need to keep in mind whenever you have this kind of stituation.

    1) DONT use any function on columns involve in filter like WHERE , group by , order by etc (LIKE where substring ( CurrentDate, 1 , Len(Currentdate) = 23 )

    2) USe batch approach while working on high volumed DML operations like

    CREATE TABLE tab1

    ( col1 INT

    , col2 CHAR(1)

    )

    INSERT INTO tab1 VALUES (1, 'A')

    INSERT INTO tab1 VALUES (1, 'B')

    INSERT INTO tab1 VALUES (1, 'C')

    INSERT INTO tab1 VALUES (1, 'D')

    INSERT INTO tab1 VALUES (1, 'E')

    INSERT INTO tab1 VALUES (8, 'F')

    INSERT INTO tab1 VALUES (9, 'G')

    ----------------------------

    BEGIN

    DECLARE @N INT -- Number of rows to delete per batch

    DECLARE @cnt decimal(18,2) -- Total count of rows matching specified criterion

    DECLARE @loops INT -- Number of times the DELETE statement should loop to delete all relevent records

    SET @N = 2

    SELECT @cnt = COUNT(*) FROM tab1 WHERE col1 = 1

    SET @loops = CEILING(@cnt/@N)

    WHILE @loops > 0

    BEGIN

    DELETE TOP (@N) FROM tab1 WHERE col1 = 1

    SET @loops = @loops - 1

    END

    END

    -----------------

    SELECT * FROM tab1

    DROP TABLE tab1

    3) Last ..Always try to do these kind of task on off-peak hours

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Great, thanks, job done.

    Sarging the WHERE clause to:

    WHERE [DateTime] < DATEADD(hour, 48, @MaxBuffer)

    did the trick and gave an index seek rather than scan, right upto "top(90,000)" (and no parallelism node appearing).

    Only when I try "top(900,000)" does it go back to index scan. But still does not give a parallelism node 🙂

    (I wonder how many more cases are lurking in my inherited system.....)

    Regards,

    Dave

  • david.moule (12/13/2012)


    Only when I try "top(900,000)" does it go back to index scan. But still does not give a parallelism node 🙂

    NOt always parallelism is bad , sometimes you urself choose intentionally (MAXDOP) so that multiple cpu(threads) execute the query

    see the link http://www.sqlservercentral.com/blogs/microsoft-business-intelligence-and-data-warehousing/2012/04/15/massively-parallel-processing-and-the-parallel-data-warehouse/

    and

    http://www.sqlservercentral.com/blogs/the-smiling-dba/2012/06/18/maxdop-and-cost-threshold-for-parallelism-an-example-for-a-parallel-query/

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply