increasingly slow performance on insert

  • Hi,

    We have a logging table which is defined by

    CREATE TABLE LOG (

     LogID uniqueidentifier NOT NULL ,

     MessageID uniqueidentifier NOT NULL ,

     DatumTijd datetime NOT NULL ,

     LogType char (3)  NOT NULL ,

     Component varchar (256)  NOT NULL ,

     Class varchar (256)  NOT NULL ,

     Method varchar (256) NOT NULL ,

     MachineName varchar (50)  NOT NULL ,

     ShortMessage varchar (1000)  NOT NULL ,

     MessageDetails text  NULL ,

     InvoerDatumTijd datetime NULL ,

     CONSTRAINT PK_LOG PRIMARY KEY  NONCLUSTERED

     (

      LogID

    &nbsp  ON PRIMARY ,

     CONSTRAINT FK_LOG_BERICHTEN FOREIGN KEY

     (

      MessageID

    &nbsp REFERENCES BERICHTEN (

      MessageID

    &nbsp

    ) ON PRIMARY TEXTIMAGE_ON PRIMARY

    GO

    alter table dbo.LOG nocheck constraint FK_LOG_BERICHTEN

    GO

    CREATE  CLUSTERED  INDEX IX_Invoerdatumtijd ON dbo.LOG(InvoerDatumTijd) ON PRIMARY

    GO

    I deleted the collation attributes for readability, they're all SQL_Latin1_General_CP1_CI_AS

    We notice that the duration for a straightforward insert - 1 record at a time - increases linear with the amount of records. When the table contains over 4,000,000 records, an insert takes more than 30 seconds, at which point the .Net SqlCommand object times out (default timeout, not an option to change this in code). The transaction isolation mode is set to "read commited".

    Profiler tells us that the insert is a "Clustered index insert".

    This is not my design, but I've noticed:

    • the clustered index is on a nullable column (which, by the way does not contain any null values)
    • the table contains a text column
    • There was a foreign key, which is disabled (I suspect for performance reasons)

    Now my question: is it inevitable that inserts on a large table with a clustered index gradually take more time as the table grows, or is there a flaw in the design which will prevent the increase in time or at least make this non-linear?

    Any help will be greatly appreciated.

    Cheers,

    Henk

  • First I'd just try a non-clustered index since it's a log table. But my guess is that the same issue will occur. More likely I'd modify the uniqueidentifier datatype to something like "bigint identity" or "binary identity", which are always ascending, thus keeping the 'hot spot' of your insert only table at the end. That 'uniqueidentifier' is just that, not guarenteed to be ascending. maybe your clustered index is splitting ? Maybe you are table scanning the index on every insert looking for an open slot ?

  • Assuming your application is using ADO to insert data, make sure inserts are done with disconnected recordsets. I've had cases where developer would just open a table and insert a record in one connection. This doesn't work in real life with large tables like yours.

  • Would "InvoerDatumTijd" have the same value for every row in the batch ? I have seen high-ratio keys cause problems in other databases, but I thought SQLServer handled it, maybe not. Make it nonclustered, and insert another column to make it more unique, eg

    (InvoerDatumTijd, LogID)

    If InvoerDatumTijd has few distinct values, it might not be worth having an index at all. Each distinct value must select <5% of the table to be useful.

  • Henk,

    The problem with CLUSTERED indexes is that they live in the data and should just not be used for transactional tables.  Inserting into the "middle" of a 4 million record table will cause all of the data to "shift down" so the new data can be inserted, with the clustered index, in the proper spot in the data.

    I think you'd be much better of using a non-clustered index.  If the table is highly transactional, you may want to set the FILL FACTOR for the index to something low like 40 or 50%.  Then, you'll need to schedule some maintanence time for DBCC REINDEX.

    And do use indexes on tables of that size... don't go nuts, though, because too many indexes can bog your code down, as well.  Let the Index Tuning Wizard do some work for some of your queries... I've found that it's one of the few things that Microsoft actually did right (dunno, they may have bought it from somebody)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi all,

    First: thank you all for your help, you've given me some usefull information. Reading the previous posts, I also realized that I left out some important information, which I will provide now:

    • The insert is done through a stored procedure (see code at the end of this post), which in turn is called using .a Net SqlCommand object (not sure wich Exceute method is used, I assume it is ExecuteNonQuery
    • InvoerDatumTijd means "date and time of insert" (some of your responses made me realize that most visitors are not native dutch speakers ;-)), and is therefore continuously increased (allthough. I don't think that the optimizer is aware of this, as it is with an identity column). Non-unique values will be very exceptional, if any (see the insert freqency in my original post)
    • As a quick fix we moved all the records to a shadow table and started with an empty LOG table. Monitoring the performance over the days since my original post, I've found that the duration of the inserts is NOT linear increasing with the amount of recors in the table. At this moment there are 1,500,000 records in the table and the duration varies between 30 and 200 ms.

    I'm starting to believe that there was a (dead?) locking issue, which we solved in the process of creating the shadow tabel and renaming the original and shadow tables and indexes.

    Is this possible? I cannot see what could be causing dead-locks in the stored procedure code. I'm also pretty sure that this is the only code that's executed against the table (a bit weird: when using the "Display dependencies..." function in enterprise manager with either the stored procedure or the table does not show the dependency between the table and the stored procedure; is this related to using a reserved word as a tablename?).

    I'm also wondering if heavy fragmentation in the table/PK index/text storage was causing the problem. I haven't tried to analyze this as I don't know how.

    Any thoughts or suggestions remain welcome.

    Cheers,

    Henk

    Stored procedure code:

    CREATE PROCEDURE SAVE_LOG

    (@LogID_pr uniqueidentifier,

    @MessageID_pr uniqueIdentifier,

    @DatumTijd_pr datetime,

    @LogType_pr char(3),

    @Component_pr varchar(256),

    @Class_pr varchar(256),

    @Method_pr varchar(256),

    @MachineName_pr varchar(50),

    @ShortMessage_pr varchar(1000),

    @MessageDetails_pr text

    )

    AS

    insert into [LOG]

    (LogID, MessageID, DatumTijd, LogType, Component, Class, Method, MachineName, ShortMessage, MessageDetails, InvoerDatumTijd)

    values

    (@LogID_pr, @MessageID_pr, @DatumTijd_pr, @LogType_pr, @Component_pr, @Class_pr, @Method_pr, @MachineName_pr, @ShortMessage_pr,

    @MessageDetails_pr, GetDate())

    GO

  • DBCC SHOWCONTIG will provide fragmentation information. Look at the other DBCC commands as well. One does on online index DEFRAG and another does a more offline index rebuild (DBCC REINDEX already mentioned by Jeff Moden). Frankly, I don't think an INSERT into an appending log file should even register time (even 30ms seems very high to me).

    Anyhow, I believe the real problem is that you are probably performing a complete index scan with every insert. I'm guessing that despite it being ordered, SQL Server doesn't actually know that the value you are inserting is larger than every other value. Therefore, I think it will run a top down index scan until it reaches the very bottom and says, "oh, I'm appending to the end". The more rows in the table, the longer it takes to scan the index and therefore linear degredation. However, the more rows are put into the table the larger the index gets so when you made the table smaller it gets cached to a greater degree. The times likely vary based on how much, if any, of that clustered index was sitting cached up in RAM at the time of the insert.

    As for a solution, turn InvoerDatumTijd into a non-indexed data column (since I expect its actually usefull to keep around) and add a UID with or without a clustered index. SQL Server should know (I'm assuming here) that a new UID will be last and therefore jump to the end of the index immediately. If not, DESC the UID index. I'm not sure if reverse ordering the UID will cause fragmentation, I'm guessing no since it would be faster to push the records down in the current page until full and then start a new page than to create a page split.

    Show the execution path of an insert within Query Analyzer. Although I don't recall looking at the path of an Insert before, I'm assuming it will show access path including index usage. If memory serves (I don't have SQL access at the moment) one of the index stats reported is rows examined either explicely or as a percentage. If my beliefs are true it should report that you are scanning the entire index on every insert right now and that converting to a UID will avoid that.

    Good luck.

  • Lot's of good ideas by lots of good folks... still, I think the clustered index is eating your insert time alive.  Try removing all clustered indexes and adding non-clustered indexes in their place (trust me ).  If you are going to add lot's of records per week, try using a low fill factor of 60 to 80 and don't forget to reindex the table once a week.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff is absolutely correct.

    Test of 20000 inserts on a table using uniqueidentifier as the key.

    Clustered index = 6min 20secs

    Heap + unique index on uniqueidentifier (fill factor 80%) = 3min 38secs

    Heap + unique index on uniqueidentifier (fill factor 50%) = 0mins 41secs

  • Hi all,

    Thanks again for all your help and suggestions. We will surely try to replace the clustered index with a non clustered. As we know that the datetime value of the "InvoerDatumtijd" column in a inserted record will never be smaller than any existing value, we will keep the fill factor on that index high, but we will set the fill factor on the non clustered primary key index (LogID uniqueidentifier) to something like 50%. We must wait for a suitable moment to make these changes, as this on a heavily used production environment.

    At this moment the table contains approximately 1,700,000 records and insert times have increased to up to several seconds.

    I'll keep you posted.

    Cheers,

    Henk

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply