SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Data loading best practices for 5-10 millions of rows.


Data loading best practices for 5-10 millions of rows.

Author
Message
Minaz Amin
Minaz Amin
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2688 Visits: 1750
Hi Expert,

As this a huge environment wanted to impement best practice for loading 5 - 10 millions of rows, please advice.

How do I take care of costraint and indexes?

Is drop / recreate indexes is good or dissable and enable indexes?

in both scenario if my cluster index is dropped or dissable, no one will be able to access the table hence data loading will break.

I was reading one article which suggested dissbling cluster index will dissable all the associated non cluster indexes and then enable the cluster index , do the data loading and enable all the non cluster index and rebuild all.

I need your input.

"More Green More Oxygen !! Plant a tree today"
Orlando Colamatteo
Orlando Colamatteo
SSC-Dedicated
SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)

Group: General Forum Members
Points: 39200 Visits: 14411
There are no hard and fast rules for this. You must test. It will depend on many variables. How many indexes do you have? How wide are they? What order is the data in when it arrives? The work to drop all indexes before loading, then building them after loading can be less than keeping all of them organized while loading. Try both ways. Try a mix of just dropping all nonclustered but keeping the clustered, especially if your data set can be sorted in the same order as the clustered index before loading.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Koen Verbeeck
Koen Verbeeck
SSC Guru
SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)SSC Guru (61K reputation)

Group: General Forum Members
Points: 61935 Visits: 13297
Also set the recovery model to simple. (maybe only during the load if you want point-in-time recovery during the rest of the day)


How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?
My blog at SQLKover.

MCSE Business Intelligence - Microsoft Data Platform MVP
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)

Group: General Forum Members
Points: 213241 Visits: 41977
Koen Verbeeck (10/25/2013)
Also set the recovery model to simple. (maybe only during the load if you want point-in-time recovery during the rest of the day)


Ooooohhhh, be careful, now. BULK LOGGED, maybe, but not SIMPLE. SIMPLE breaks the entire log chain. Yeah, you can do a DIFF backup to quickly reestablish the chain but why break it in the first place?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)SSC Guru (213K reputation)

Group: General Forum Members
Points: 213241 Visits: 41977
Minaz (10/24/2013)
Hi Expert,

As this a huge environment wanted to impement best practice for loading 5 - 10 millions of rows, please advice.

How do I take care of costraint and indexes?

Is drop / recreate indexes is good or dissable and enable indexes?

in both scenario if my cluster index is dropped or dissable, no one will be able to access the table hence data loading will break.

I was reading one article which suggested dissbling cluster index will dissable all the associated non cluster indexes and then enable the cluster index , do the data loading and enable all the non cluster index and rebuild all.

I need your input.


This would be, IMHO, a very good candidate for using a staging table (you should always use a staging table for imported data for many reasons) and partitioning.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Minaz Amin
Minaz Amin
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2688 Visits: 1750
Thank you all, yes am also planing to follw the advice by loading through staging table. My main table is paritioned ( did it now ) on yearlywise so to maintain it easily.

while inserting do my indexes are fragemented? Just by updating the index stat will not update the table and during maintenanace window I can ran the rebuild/reorg index job depending on fragmentation percentage.

what do you advice?

"More Green More Oxygen !! Plant a tree today"
Orlando Colamatteo
Orlando Colamatteo
SSC-Dedicated
SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)

Group: General Forum Members
Points: 39200 Visits: 14411
Minaz (10/28/2013)
while inserting do my indexes are fragemented? Just by updating the index stat will not update the table and during maintenanace window I can ran the rebuild/reorg index job depending on fragmentation percentage.

Chances are indexes will become fragmented to some degree but it will depend on things like the index fill factor, which columns you are updating, the columns and order of the indexes, lots of things, as to whether they will or how much they will become fragmented.

For example if you insert a ton of data into a table that only has a clustered index where the leading edge of the index is an identity column then you won't see much fragmentation in that clustered index, however nonclustered indexes could be affected. Or let's say you update many rows in the table, but you only update an integer column that was present when all rows in the table were first inserted, then you will also not see much fragmentation because integer is a fixed-width data type.

But let's say you insert many rows into a table where the clustered index is on last name, if data being imported and data in the table are both distributed in a similar way then chances of encountering page splits and fragmentation ensuing is high. Or say you will be updating many rows, updating a varchar column that was previously null but now will have a value, and the fill factor is 100, chances of fragmentation are high.

Even deleting can cause fragmentation, so be aware of that possibility as well.

The good news is that index fragmentation is quite easy to solve. Check the scripts at http://ola.hallengren.com for a solution that will work for many systems with no special effort other than compiling some stored procedures and functions and setting up a scheduled job, and will be mostly hands off if even only a small nightly maintenance windows exists for the database.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
cdickens 87202
cdickens 87202
Grasshopper
Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)Grasshopper (15 reputation)

Group: General Forum Members
Points: 15 Visits: 12
I also like the idea of utilizing a staging table to get your latest data into the database without interrupting data access.

Furthermore, you might consider using the MERGE INTO syntax to allow SQL Server to reconcile the differences and commit DML changes to the primary data table to reduce the amount of changes occurring only to what is actually different. This should reduce the amount of activity amongst all related indexes, triggers, etc and again allow the table to stay online for queries. Unfortunately, I've not attempted MERGEs on such a large data set, so by all means test and report back to us. :-)

--Chris
Orlando Colamatteo
Orlando Colamatteo
SSC-Dedicated
SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)SSC-Dedicated (39K reputation)

Group: General Forum Members
Points: 39200 Visits: 14411
Staging tables are fine but they don't buy you much if you're eventually having to load into the final table anyway. If your dataset is clean reliable or you have proper error handling in your loader (e.g. SSIS) then loading into the final table directly can far and away outperform loading into a staging table first and then using T-SQL to load into your final table.

MERGE might seem like a good choice conceptually but there is a ton of evidence out there that shows that INSERT followed by UPDATE inside a transaction is usually to be a better performer when handling large datasets. Test for yourself, of course. Also worth mentioning, there were more than its fair share of bugs opened against MERGE so make sure you have the latest patches and watch out for the ones that are still active as well.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Minaz Amin
Minaz Amin
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2688 Visits: 1750
Thank you for sharing your thoughts.


I will try both the options to merge and to use stage table.

Let me first try stage table and then switch off the data to paritioned table. Let me check the performance and report back to you all.

"More Green More Oxygen !! Plant a tree today"
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search