Performance Issues with High Volume Inserts and Selects on same table

  • i have an issue wherein a set of 6 tables on the same database receive high volume of inserts throughout the day.

    The load (inserts) happens via a ssis job and runs throughout the day and the volume is about 20,000 rows in each of the tables and runs every 10 min. (20,000 per 10 min*6 times in a hour *23 hours ) per table.

    Enabled set of clustered and non clustered indexes on these tables and extracts (ssis package selects data and outputs in flat file with some business logic) begin post 11pm in the night and are supposed to complete the extracts in an hour 12 am midnight.

    From the business perspective, the extracts are nothing but consolidation of all the load which happened for the day and must complete the output to flat files sent to different consumers.

    Now there are 3 SSIS packages involved.

    i. Load

    ii. Extract

    iii. delete old data

    Both the packages are simple packages and have been optimized to the best.

    Environment: SQL2k5 , 8 cpus, 16 gb ram, Win 2003 server

    Both the Load and the extract always have performance issues and often fail to generate extracts before 12 am midnight.

    Issues with load :

    High fragmentation

    Slow loading, SQL job hangs for the load ssis

    Inserts are slow bcoz of indexes

    Issues with the Extract:

    Extracts are fast when we run on a clean table

    Many a times fail to generate extracts before midnight due to slower processing or slower select

    Interestingly i saw a microsoft article which says heap's are slower to insert. http://support.microsoft.com/kb/297861

    Can somebody suggest what would be the best thing to do in this situation.

    Experts Please pitch in.

    please see the table/index structure attached

  • Can you drop the Indexes prior to the load and then recreate them?

    That would speed up the process and help with the fragmentation.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • You will have to provide a lot more detail before anyone will be able to provide a solid recommendation on how to improve your situation.

    Here are some initial questions that come to mind:

    - How many tables?

    - How are the inserts being done? OLE DB destination, Calling a stored procedure for each row, calling an INSERT statement for each row, other?

    - What are the clustered indexes on the tables and are they unique? Feel free to provide DDL for the tables including the index creation statements.

    - How many non-clustered indexes are on the table?

    - How many rows are in each of the tables?

    - Is there a primary key on every table?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Hi OPC,

    thanks for responding. as i said earlier, the ssis is fairly simple package and has been optimized to the best.

    - How many tables?

    6 . Already mentioned.

    - How are the inserts being done? OLE DB destination, Calling a stored procedure for each row, calling an INSERT statement for each row, other?

    OLEDB Destination. Its within an optimized SSIS package.

    - What are the clustered indexes on the tables and are they unique? Feel free to provide DDL for the tables including the index creation statements.

    Yes. They are unique. Combination of 3 columns which is used in frequently in the where clause of the extracts.

    - How many non-clustered indexes are on the table?

    Unique non-clustered index on the 4 columns as per the order.

    - How many rows are in each of the tables?

    Already mentioned. 20,000 rows loaded every 10 mins in each of the tables

    - Is there a primary key on every table?

    No

  • Welsh Corgi (5/9/2011)


    Can you drop the Indexes prior to the load and then recreate them?

    That would speed up the process and help with the fragmentation.

    I'm not sure you understood the problem. The load happens 23 hours in a day and the extract must complete for the entire day in the last 1 hour of the day.

    Load begins: 00:00:01 of the day

    Load ends : 22:59:59 of the day

    Extract begins: 23:00:01

    Extract must end : 23:59:59 of the day

    Throughout the day about 3 million rows are inserted. Dropping and recreating indexes for 3 million rows takes more than 2 hours and thats not a feasible solution. Already tried.

  • Something doesn't sound right here

    Your said your environment was

    Environment: SQL2k5 , 8 cpus, 16 gb ram, Win 2003 server

    Yet you also said

    Dropping and recreating indexes for 3 million rows takes more than 2 hours and thats not a feasible solution.

    I think we need to see table and index structures, because in your environment, I wouldn't expect dropping and creating indexes on 3 million rows to take anywhere near 2 hours... unless you have dozens of indexes and extremely wide rows or serious blocking issues or serious IO throughput problems.

    Just as an example, which is very non-scientific and you can pick all sort of holes in it, and it may be completely different to your situation blah blah blah

    Using the script below, I could drop and recreate the indexes on 3 million rows in under 15 minutes...

    on a low spec laptop, in a VM with 1gb ram, using a USB connected 2.5 inch portable drive.

    create table A

    (id int identity not null,

    cola char(25) not null,

    colb int null,

    colc varchar(10) not null,

    cold varchar(30) not null,

    cole char(10) null,

    colf varchar(20),

    colg datetime not null,

    colh char(300) null,

    coli datetime null)

    go

    insert A (cola, colb, colc, cold, cole, colf, colg, colh, coli)

    select top 3000000 'abcdefghijklmnopqrstuv',

    a.object_id,

    'abghdrgyfd',

    'sdfgk;sdfgljsdfglkdfgdfgsdfgsd',

    'ghklghklg',

    'dflkdfglkjdfgkl',

    a.create_date,

    a.name,

    a.modify_date

    FROM master.sys.objects a

    cross join master.sys.objects b

    cross join master.sys.objects c

    cross join master.sys.objects d

    go

    alter table a add constraint pk primary key clustered(id)

    create index inda on a(colb, colc)

    create index indb on a(colc, colb)

    create index indc on a(colh, colg)

    create index indd on a(cole, coli)

    create index inde on a(colh, cola)

    go

    drop index a.inda

    drop index a.indb

    drop index a.indc

    drop index a.indd

    drop index a.inde

    alter table a drop constraint pk

  • Ian Scarlett (5/10/2011)


    I think we need to see table and index structures, because in your environment, I wouldn't expect dropping and creating indexes on 3 million rows to take anywhere near 2 hours... unless you have dozens of indexes and extremely wide rows or serious blocking issues or serious IO throughput problems.

    Hi All,

    Thanks Ian. Interesting enough .. and I'm thinking of wearing a detective hat to find out the root cause.

    Please see the attached table structure and index structure. There are 6 tables (3 pairs, odd/even logic) and roughly each of the tables have 20k inserts every 10 mins. Dropping index on all 6 tables at the same time and recreating them at the same time on the same database, with load happening in the background on the same set of tables took 2 hours. Start time and end time in the sql job wherein a sql statement to drop and recreate index was written.

  • IMHO your table structures and size/number of indexes are nowhere near sufficient to cause a 2 hour drop/create time for 3 million rows.

    Break out that detective hat 🙂

  • facticatech (5/10/2011)


    - How many tables?

    6 . Already mentioned.

    You did, sorry I missed that.

    facticatech (5/10/2011)


    - How are the inserts being done? OLE DB destination, Calling a stored procedure for each row, calling an INSERT statement for each row, other?

    OLEDB Destination. Its within an optimized SSIS package.

    Are you using FastLoad? What is your batch size?

    facticatech (5/10/2011)


    - What are the clustered indexes on the tables and are they unique? Feel free to provide DDL for the tables including the index creation statements.

    Yes. They are unique. Combination of 3 columns which is used in frequently in the where clause of the extracts.

    It looks like you are using a composite clustered index. For tables that are loaded in bulk like you are describing it is almost always better to use an ever increasing, narrow clustering key, i.e. an identity column with type INT (or BIGINT if needed). Unless your flat files are pre-sorted in the order of the clustering key and the data is always guaranteed to be appended to the end of the clustering key sequence then an identity will usually be the better choice. Can you clarify?

    facticatech (5/10/2011)


    - How many rows are in each of the tables?

    Already mentioned. 20,000 rows loaded every 10 mins in each of the tables

    I meant in total, not per batch load. I see you mentioned later that there were 3MM rows in one of your tables.

    facticatech (5/10/2011)


    - Is there a primary key on every table?

    No

    Just to double-check, because it was not mentioned in your xls file, the clustered indexes are declared as UNIIQUE, right?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • To piggy back a little on what opc.three was saying...

    You have relatively wide records with clustered indexes. If the order of the data in the clustering index fields is not sequential (consistently increasing or decreasing) you'll see a LOT of fragmentation in your clustered indexes, which means a lot of page splitting and reordering. Since ALL of your data is on those clustered index pages, there's a lot of thrashing going on.

    IF you have the option to add an additional IDENTITY column and can make THAT the clustering key, AND can make the current clustered indexes NONclustered, I'm wagering you'll see much improvement in load speed. Yes, you'll be maintaining an extra index, but they'll be non clustered, with more values per page, so fewer splits and rewrites. When your SSIS package clears out the tables at midnight, you might want to reset the SEED value on the IDENTITY column.

    Also, are the indexes used during the 23 hours of the day when you're just loading? I couldn't tell from your posts; it sounded like they're only used for the extracts from 11 to midnight. If so, you might find you CAN leave them off during the load period and that they will re/build quite quickly since they're not clustered. With the beefy hardware you've got, I'd think it wouldn't take very long.... Depending on the types of selects you're using to create those extracts, you might also see the extract speed be much higher, since they won't be as fragmented either (having been just built).

    Of course, you've got the data and the environment, so none of this may be valid for your situation. Just my two bits worth.

    Rob Schripsema
    Propack, Inc.

  • Rob Schripsema (5/10/2011)


    To piggy back a little on what opc.three was saying...

    You have relatively wide records with clustered indexes. If the order of the data in the clustering index fields is not sequential (consistently increasing or decreasing) you'll see a LOT of fragmentation in your clustered indexes, which means a lot of page splitting and reordering. Since ALL of your data is on those clustered index pages, there's a lot of thrashing going on.

    😎 Right on Rob, that's exactly what I was driving at with my questioning about the clustered indexes. It could be further exaggerated I think if they were not declared as UNIQUE and there were duplicates in the data. Organizing the data on disk sequentially using an identity column as the clustering key could translate into big gains in bulk load performance.

    The narrower clustering key would also reduce the overall size of the nonclustered indexes potentially improving performance there as well when selecting and when maintaining the indexes. Fragmentation in the nonclustered indexes will be unavoidable given the nature of them but that would seemingly be a wash since it is occurring in the current setup.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • It could be further exaggerated I think if they were not declared as UNIQUE and there were duplicates in the data. Organizing the data on disk sequentially using an identity column as the clustering key could translate into big gains in bulk load performance.

    The narrower clustering key would also reduce the overall size of the nonclustered indexes potentially improving performance there as well when selecting and when maintaining the indexes. Fragmentation in the nonclustered indexes will be unavoidable given the nature of them but that would seemingly be a wash since it is occurring in the current setup.

    Exactly. And I'd not considered the fact that the non-clustered indexes get smaller when the clustering key is narrower. Good point.

    Rob Schripsema
    Propack, Inc.

  • 1) http://support.microsoft.com/kb/297861: that article is for sql 7 and sql 2000!!! WAY out of date.

    2) your request for help is very deep and complex. That isn't typically what forums are useful for. You could hunt-and-peck on the forum for days and still not get good results. I suggest you get an expert on board for a day or two to help you analyze and tune your processes.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply