Regulary delete and rebuild of table influences perfomance?

  • Ahoi,

    Situation:
    i have a table with around 40 million rows and a clustered primary key set by IDENTITY(1,1).
    Theres regulary deletes and refilling of the table. With the following result:
    Number of Rows: 40 Million
    Current MAX KEY: 98 Million

    So currently around 58 Million rows have been deleted/rebuilt.
    For Index Perfomance we use Ola Hallengren.

    Ola Hallengren
    Question/Problem:
    - Is this causing the perfomance issues on the table i have executing sql  commands on it?
    - Is there something i can/should do?

    Edit: is there something like  a "short question thread" , i feel bad opening  a new thread every time i  have question and it kinda "prevents" me from asking questions on a more regular basis
    Edit 2 : fk realiszed im in the wrong section :/

  • When you say 58 million rows have been deleted/rebuilt, what do you mean?  What rows are affected - just those at the end of the table, or rows in random places?  What performance issues are you having, and what sql commands are you running when this happens?  Please post full DDL for the table, including constraints and indexes.

    John

  • John Mitchell-245523 - Monday, June 4, 2018 3:55 AM

    When you say 58 million rows have been deleted/rebuilt, what do you mean?  What rows are affected - just those at the end of the table, or rows in random places?  What performance issues are you having, and what sql commands are you running when this happens?  Please post full DDL for the table, including constraints and indexes.

    John

    Sorry.
    Delete / Rebuilt: certain rows are deleted from the table and new ones are added, which means the identity counter rises and more gaps in the identity counter happen
    What rows are affected: not those of the end, "random" inbetween
    Perfomance Issues: ive tried different selects and deletes  they take  quiet alot longer now (deleting where year(auftragdatum) = 2018 for example

    Sorry im not sure about the DDL part, here are the things i did for the table:
    SELECT CONSTRAINT_NAME
    FROM INFORMATION_SCHEMA.CONSTRAINT_TABLE_USAGE

    CONSTRAINT_NAME
    PK_Umsatz_Pivot

    SELECT name, type_desc, is_unique, is_primary_key
    FROM sys.indexes


    sp_help 'Table'

  • DDL = CREATE TABALE statement, including CREATE INDEX statements.  Queries = the full query you're having a problem with.

    If you're filtering on auftragdatum, that's likely to be painful, since you don't have any indexes where that is the leading column.  Therefore such queries are likely to require a full clustered index scan.  Also, if you wrap the column name in a function, that's likely to make it non-sargable, also probably leading to a clustered index scan, even if you do have the right index in place.  Try WHERE auftragdatum >=  '20180101' AND auftragdatum < '20190101' instead.

    It doesn't sound like your issues are to do with those deletes.  Values in identity columns should not be exposed to users and so having gaps in the sequence doesn't matter.

    John

  • Gaps in an IDENTITY column will have no affect on performance.

    Gaps on pages due to freespace created by deletes could (but usually not a real serious problem unless it creates a super low natural fill factor) as the data grows because the smallest "read" that SQL Server can do is a full page (you probably already know that but had to say it out loud).

    The slowness of your deletes is likely due to things like FKs on the table.  This is why people ask for you to "post the DDL", which means they'd like you to post the CREATE TABLE statement with all the constraints, indexes, and triggers that may be on the table so that they can figure some of that stuff out for you.

    What would also be very helpful is if you posted your deleted code and the ACTUAL execution plan so folks can help you science out the slowness you're experiencing.  Please see the article at the second link under "Helpful Links"  in my signature line below for more information on how to provide us with the best information you can post to troubleshoot performance problems.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have attached the full create statement for the table, the execution plan and the results of sp_help on the table in an Excel File.
    There are no FKs referring  on that table, sp_help says so aswell

  • ktflash - Tuesday, June 5, 2018 12:53 AM

    I have attached the full create statement for the table, the execution plan and the results of sp_help on the table in an Excel File.
    There are no FKs referring  on that table, sp_help says so aswell

    Excellent.

    Those are some pretty wide covering indexes you have on a pretty wide table.  I'm not at all surprised that deleting a million and a half rows is taking so long.  I was also under this mistaken impression that you were deleting the whole table and not just a part of it.

    I have an idea that will make this nearly instantaneous. The idea I have will also greatly reduce your index maintenance time and, possibly, any backup you may be doing.

    With that, would you execute the following code and post the results, please?    I also need to know if you're using the STANDARD Edition or the ENTERPRISE Edition and whether or not you're really using SQL Server 2016.

    Here's the code I need you to execute and post (or attach) the results for.

     SELECT  YR     = YEAR(auftragsdatum)
            ,ID     = Ergebnisbereich_ID
            ,RowCnt = COUNT(*)
       FROM dwh.fact_umsatz_pivot
      GROUP BY YEAR(auftragsdatum),Ergebnisbereich_ID
    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Not sure if you just wanted the results itself or a execution plan again so i attach both.
    -->  SQL Server 2016  Express

    For me more important than solving the issue is understanding perfomance issues in general.
    As for most topics there are dozens of books  to each topic.
    As i stated in my first/previous topic, i'm pretty much depending/responsible all on myself.  Is there something like a book where i can gain the basic knowledge to analyse performance?

  • ktflash - Wednesday, June 6, 2018 2:27 AM

    Not sure if you just wanted the results itself or a execution plan again so i attach both.
    -->  SQL Server 2016  Express

    For me more important than solving the issue is understanding perfomance issues in general.
    As for most topics there are dozens of books  to each topic.
    As i stated in my first/previous topic, i'm pretty much depending/responsible all on myself.  Is there something like a book where i can gain the basic knowledge to analyse performance?

    Was after just the results but thanks for both because I wasn't totally clear in my post.

    So, the bottom line is that on a regular basis, you delete the entire year for a given Ergebnisbereich_ID and totally reload it, correct?  How often do you get such a file to load for each Ergebnisbereich_ID?

    The biggest question that I have is, other than the Umsatz_Pivot_Key column, which column or combination of columns uniquely defines a row in the table?

    The reason for my questions is because I'm trying to figure out the best way to avoid the deletes to begin with.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Not on a regular basis, i had to recreate it for different testing reasons. There was an update on the data done my my precedessor and after months they realised there was a mistake in it.
    For  testing purposes and on user requests i have to reload the data.
    1. now im confused, if deleting and readding isnt the issue, so why try to avoid deleting?
    2.  not sure about the column combination, the table is used as a base for a fact view  with the keys for the dimensions and not really something else, so theres no other table that has an umsatz_pivot_key that refers on dwh.fact_umsatz_pivot
    i could try running distinct commands to see when the number of results is the same as tables but i doubt thats what im supposed to do

  • ktflash - Thursday, June 7, 2018 5:56 AM

    Not on a regular basis, i had to recreate it for different testing reasons. There was an update on the data done my my precedessor and after months they realised there was a mistake in it.
    For  testing purposes and on user requests i have to reload the data.
    1. now im confused, if deleting and readding isnt the issue, so why try to avoid deleting?
    2.  not sure about the column combination, the table is used as a base for a fact view  with the keys for the dimensions and not really something else, so theres no other table that has an umsatz_pivot_key that refers on dwh.fact_umsatz_pivot
    i could try running distinct commands to see when the number of results is the same as tables but i doubt thats what im supposed to do

    Deleting IS an issue.  It will always be an issue because it's always fully logged and it always affects every index which is also fully logged and is all a part of the reason why it all takes so long.  It also leave holes in your indexes which will waste memory when they load and will need to be defragged, etc, etc.

    Shifting gears back to your problem, I too am a bit confused because the title of this thread is "REGULARLY delete and rebuild of table influences performance?"  Now you're saying that it's not done "regularly"?

    Either way, regularly or not, although the row count isn't that much, this is a wide table that contains a lot of data and it's only going to continue to get larger.  Right now, you're backing up several years worth of data that will never change. You're also doing index maintenance on all of that and because of all that data that isn't going to change and the non-100% fill factors that are necessarily being used, you're also wasting a fair bit of memory and disk space.  Also, actions like what you've been tasked to do will happen again. 

    It won't help you're immediate problem with deleting rows but I suggest that you partition the old years by year and the current and previous year by ID.  My personal preference is to use Partitioned Views because it has the advantage (when designed with such a thought in mind) of true "partial restores" whereas Partitioned Tables cannot truly do such a thing.  The Partitioned View would allow you to simply truncate the table for a given YEAR/ID combination and the reload it (a Partitioned Table would allow similar using SPLIT and MERGE but it's a bit more work).  Both methods would allow you (if designed correctly) to set old static data to "READ_ONLY" and never have to back them up ever again after an initial backup.  It would also (again, if designed correctly) allow you to defrag indexes at the partition level meaning that none of the static partitions would ever need to be defragged again once you packed-out any free space and made them "READ_ONLY".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply