Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Data loading best practices for 5-10 millions of rows. Expand / Collapse
Author
Message
Posted Thursday, October 24, 2013 10:48 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 3:34 AM
Points: 549, Visits: 1,554
Hi Expert,

As this a huge environment wanted to impement best practice for loading 5 - 10 millions of rows, please advice.

How do I take care of costraint and indexes?

Is drop / recreate indexes is good or dissable and enable indexes?

in both scenario if my cluster index is dropped or dissable, no one will be able to access the table hence data loading will break.

I was reading one article which suggested dissbling cluster index will dissable all the associated non cluster indexes and then enable the cluster index , do the data loading and enable all the non cluster index and rebuild all.

I need your input.


"More Green More Oxygen !! Plant a tree today"
Post #1508324
Posted Friday, October 25, 2013 1:27 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 7,094, Visits: 12,581
There are no hard and fast rules for this. You must test. It will depend on many variables. How many indexes do you have? How wide are they? What order is the data in when it arrives? The work to drop all indexes before loading, then building them after loading can be less than keeping all of them organized while loading. Try both ways. Try a mix of just dropping all nonclustered but keeping the clustered, especially if your data set can be sorted in the same order as the clustered index before loading.

__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1508349
Posted Friday, October 25, 2013 1:33 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 9:09 AM
Points: 13,356, Visits: 10,219
Also set the recovery model to simple. (maybe only during the load if you want point-in-time recovery during the rest of the day)



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1508353
Posted Friday, October 25, 2013 10:11 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:06 PM
Points: 36,786, Visits: 31,243
Koen Verbeeck (10/25/2013)
Also set the recovery model to simple. (maybe only during the load if you want point-in-time recovery during the rest of the day)


Ooooohhhh, be careful, now. BULK LOGGED, maybe, but not SIMPLE. SIMPLE breaks the entire log chain. Yeah, you can do a DIFF backup to quickly reestablish the chain but why break it in the first place?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1508518
Posted Friday, October 25, 2013 10:13 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 8:06 PM
Points: 36,786, Visits: 31,243
Minaz (10/24/2013)
Hi Expert,

As this a huge environment wanted to impement best practice for loading 5 - 10 millions of rows, please advice.

How do I take care of costraint and indexes?

Is drop / recreate indexes is good or dissable and enable indexes?

in both scenario if my cluster index is dropped or dissable, no one will be able to access the table hence data loading will break.

I was reading one article which suggested dissbling cluster index will dissable all the associated non cluster indexes and then enable the cluster index , do the data loading and enable all the non cluster index and rebuild all.

I need your input.


This would be, IMHO, a very good candidate for using a staging table (you should always use a staging table for imported data for many reasons) and partitioning.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1508520
Posted Monday, October 28, 2013 1:02 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 3:34 AM
Points: 549, Visits: 1,554
Thank you all, yes am also planing to follw the advice by loading through staging table. My main table is paritioned ( did it now ) on yearlywise so to maintain it easily.

while inserting do my indexes are fragemented? Just by updating the index stat will not update the table and during maintenanace window I can ran the rebuild/reorg index job depending on fragmentation percentage.

what do you advice?


"More Green More Oxygen !! Plant a tree today"
Post #1508757
Posted Monday, October 28, 2013 9:49 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 7,094, Visits: 12,581
Minaz (10/28/2013)
while inserting do my indexes are fragemented? Just by updating the index stat will not update the table and during maintenanace window I can ran the rebuild/reorg index job depending on fragmentation percentage.

Chances are indexes will become fragmented to some degree but it will depend on things like the index fill factor, which columns you are updating, the columns and order of the indexes, lots of things, as to whether they will or how much they will become fragmented.

For example if you insert a ton of data into a table that only has a clustered index where the leading edge of the index is an identity column then you won't see much fragmentation in that clustered index, however nonclustered indexes could be affected. Or let's say you update many rows in the table, but you only update an integer column that was present when all rows in the table were first inserted, then you will also not see much fragmentation because integer is a fixed-width data type.

But let's say you insert many rows into a table where the clustered index is on last name, if data being imported and data in the table are both distributed in a similar way then chances of encountering page splits and fragmentation ensuing is high. Or say you will be updating many rows, updating a varchar column that was previously null but now will have a value, and the fill factor is 100, chances of fragmentation are high.

Even deleting can cause fragmentation, so be aware of that possibility as well.

The good news is that index fragmentation is quite easy to solve. Check the scripts at http://ola.hallengren.com for a solution that will work for many systems with no special effort other than compiling some stored procedures and functions and setting up a scheduled job, and will be mostly hands off if even only a small nightly maintenance windows exists for the database.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1509176
Posted Tuesday, October 29, 2013 1:54 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, November 7, 2013 7:01 AM
Points: 1, Visits: 7
I also like the idea of utilizing a staging table to get your latest data into the database without interrupting data access.

Furthermore, you might consider using the MERGE INTO syntax to allow SQL Server to reconcile the differences and commit DML changes to the primary data table to reduce the amount of changes occurring only to what is actually different. This should reduce the amount of activity amongst all related indexes, triggers, etc and again allow the table to stay online for queries. Unfortunately, I've not attempted MERGEs on such a large data set, so by all means test and report back to us.

--Chris
Post #1509558
Posted Tuesday, October 29, 2013 4:54 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 6:48 AM
Points: 7,094, Visits: 12,581
Staging tables are fine but they don't buy you much if you're eventually having to load into the final table anyway. If your dataset is clean reliable or you have proper error handling in your loader (e.g. SSIS) then loading into the final table directly can far and away outperform loading into a staging table first and then using T-SQL to load into your final table.

MERGE might seem like a good choice conceptually but there is a ton of evidence out there that shows that INSERT followed by UPDATE inside a transaction is usually to be a better performer when handling large datasets. Test for yourself, of course. Also worth mentioning, there were more than its fair share of bugs opened against MERGE so make sure you have the latest patches and watch out for the ones that are still active as well.


__________________________________________________________________________________________________
There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Post #1509608
Posted Sunday, November 3, 2013 7:52 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, March 11, 2014 3:34 AM
Points: 549, Visits: 1,554
Thank you for sharing your thoughts.


I will try both the options to merge and to use stage table.

Let me first try stage table and then switch off the data to paritioned table. Let me check the performance and report back to you all.





"More Green More Oxygen !! Plant a tree today"
Post #1510962
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse