Query optimization

  • Hi all,

    I have one "heavy" query on my DB. It executes 2400 times per day and I don't know how to optimize it.

    Do you have any idea?

    Thanks.

    DECLARE @time datetime,

    @rowcount int 2400

    select @time = dateadd(-HH, 24, GETUTCDATE())

    WHILE (@rowcount=2400)
    BEGIN
    ;WITH CTE_1 AS
    (SELECT TOP 2400 *
    FROM dbo.Table1
    WHERE MyDate < @time)
    DELETE FROM CTE_1

    SELECT @rowcount = @@rowcount
    WAITFOR DELAY = '00:00:01';
    END
  • I would run the query with 'SET Statistics IO ON' and run the query with the actual execution plan.  Chances are the execution plan will give you a missing index recommendation on the MyDate column to create.  I would also look at the Messages for how many logical reads (pages) you are getting from the Table1.  The  key to speed is to reduce logical reads.

    Another tip I learned this week is using, http://www.firstreponderkit.org or brentozar.com, to get the sp_blitzfirst script and run:

    sp_blitzfirst @seconds=60, @expertmode=1

    The above command gets performance data start, wait for 60 seconds, and get performance data again.  This allowed me to run another query during that time and let me know in my case I had a wait of Async IO which was network related so my 5 second query spent 4 seconds  feeding the result sets from SQL server VM through the cocktail straw of my workstation.

     

  • The problem is the loop and the fact that this runs 2400 times per day.  It appears the goal is to delete any rows older than 24 hours (your DATEADD is incorrect here - which I am assuming is just a bad copy/paste).

    I would not set this up to delete based on a specific hour - I would round to the previous day instead.

    Declare @rowsAffected int = 1
    , @batchSize int = 2400
    , @time datetime = dateadd(day, datediff(day, 0, getutcdate()) - 1, 0);

    While @rowsAffected > 0
    Begin

    Delete Top (@batchSize)
    From dbo.Table1
    Where MyDate < @time;

    Set @rowsAffected = @@rowcount;

    Waitfor Delay = '00:00:01';

    End;

    This will affect any rows prior to yesterday at midnight (ex: '2021-01-28 00:00:00.000') UTC.

    To optimize this, you need to have an index on MyDate - but it may not be used, depending on how much data exists in the table vs how many rows meet the requirement.  Another option would be a modification by adding a loop around the above based on the day and deleting in batches for each day until you get to the last day to be deleted.

    Something like this...definitely test and validate prior to using:

    Declare @rowsAffected int = 1
    , @batchSize int = 2400
    , @latestDate datetime = dateadd(day, datediff(day, 0, getutcdate()) - 1, 0)
    , @oldestDate datetime = coalesce((Select min(MyDate) From dbo.Table1), '2020-01-01');

    Declare @deleteDate datetime = dateadd(day, datediff(day, 0, @oldestDate) + 1, 0);

    While @deleteDate < @latestDate
    Begin

    While @rowsAffected > 0
    Begin

    Delete Top (@batchSize)
    From dbo.Table1
    Where MyDate < @deleteDate;

    Set @rowsAffected = @@rowcount;

    Waitfor Delay = '00:00:01';
    End

    Set @deleteDate = dateadd(day, 1, @deleteDate);

    End;

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Be sure to cluster the table on ( MyDate, $IDENTITY ).  The $IDENTITY just insures that the key is unique.  That will make the deletes very efficient.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Ah... be careful on the Clustered Index recommendation.  While such a recommendation can help a lot of things, especially this particular problem, changing the Clustered Index without understanding the impact on other queries could lead to devastation in many other areas.

    And, no... I'm not against this change.  In fact, it's usually a great recommendation.  You can't just assume its usefulness nor its impact based on a single query/delete.  "Must look eye".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • christi1711 wrote:

    Hi all, I have one "heavy" query on my DB. It executes 2400 times per day and I don't know how to optimize it. Do you have any idea? Thanks.

    DECLARE @time datetime, @rowcount int 2400

    select @time = dateadd(-HH, 24, GETUTCDATE())

    WHILE (@rowcount=2400)
    BEGIN
    ;WITH CTE_1 AS
    (SELECT TOP 2400 *
    FROM dbo.Table1
    WHERE MyDate < @time)
    DELETE FROM CTE_1

    SELECT @rowcount = @@rowcount
    WAITFOR DELAY = '00:00:01';
    END

    I don't want to assume because, a lot of times, code doesn't actually match the requirement to begin with.  What is the actual requirement here and what is causing this code to execute?  I ask the latter because code that runs 2400 times each day averages out to 1 run every 36 seconds, which is a bit of an oddity to being with.

    Also, how many rows are generally accumulated in this table per hour?  It's important to know that because it may actually be a bit silly to do a loop to delete by such small batch sizes.

    To summarize my questions...

    1. Why are you deleting rows from this table to begin with?
    2. What is causing/controlling the code to execute 2400 times each day?
    3. How many rows per hour is this table accumulating?
    4. What is the structure of this table (Post the CREATE TABLE for this table)?
    5. What are the indexes that are currently available on this table (Post the CREATE INDEX for each index including the index for the PK)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • regarding the clustered index recommentation:

    you do not necessary have to cluster your table by MyTime. When MyTime is ascending too (e.g. the creation timestamp) you may e.g. simply selecting the MAX(id) WHERE MyTime < @time and then DELETE WHERE id < @max_id.

    Or you could first select the Clustered index columns into a #temp_table WHERE MyTime < @time and run a DELETE FROM t FROM table_1 as t INNER JOIN (SELECT TOP 2400 id FROM #temp ORDER BY id) as tmp on tmp.id = t.id (followed by a DELETE TOP (2400) FROM #temp), ideally #temp has a clustered index over id in this case

    But as always: the best solution depends on your data - what want you to delete, how many records are deleted, how are your indexes, how many rows has the table in total...


    Edit: of course you could start implementing table partitioning (on the MyTime column on daily basis) too. In this case you would need to run your job only once per day and remove the previous day data by simply running a

    TRUNCATE TABLE xxx WITH (PARTITIONS ($partition.pf_my_time(DATEADD(DAY, -1, GETUTCDATE()))

    which would take only a few milliseconds to be executed, regardless how much data / rows you are deleting.

     

    • This reply was modified 3 years, 2 months ago by  Thomas Franz.

    God is real, unless declared integer.

  • Apparently, the OP has left the building.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • They have a query that runs 2400+ times a day using that column to DELETE from the table -- I stick with my recommendation, cluster the table first by that column, esp. since it's a naturally ascending value as well.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • @scott: The question, if they should cluster for it or not depends, how else this table is used. Yes, 2,400 deletes per day may be a lot, but when they are running a million SELECTs and / or UPDATEs per day using the current clustered index and returning many columns, it would be stupid to cluster by MyTime just to speed the DELETEs up.

    God is real, unless declared integer.

  • Well, obviously, I don't consider it "stupid".  Remember, the DELETEs are almost twice a minute.  If the table has to be scanned in order to do the deletes, that would cause massive locking of the table, causing all kinds of deadlocks.  Or, to prevent the deadlocks, forcing (almost) all reads to all be done WTIH (NOLOCK), which has potential issues as well.  The only way to be really sure that you don't get a table scan for the deletes is to cluster the table to match the deletes.

    Reads, however, can be done with nonclus indexes, covering ones if necessary.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Like I said above, changing the clustered index to support these DELETEs would certainly help these deletes A LOT.  But, changing the clustered index without knowing "the rest of the story" may have a devastating impact on other much more important code.

    Unfortunately, it looks like the OP has abandoned this thread and so we simply don't know what "the rest of the story" actually is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    But, changing the clustered index without knowing "the rest of the story" may have a devastating impact on other much more important code.

    If you build nonclustered indexes as necessary, how so?  What typical processing couldn't you handle that way?

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply