Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Performance hit on Indexing (I Think)


Performance hit on Indexing (I Think)

Author
Message
aaron.reese
aaron.reese
Mr or Mrs. 500
Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)

Group: General Forum Members
Points: 589 Visits: 898
Hi guys,

I am looking after / building a solution that loads data from flat files using bulk insert.
The file contains a unique reference (PolicyID) which is in each file but the system architect has requested that each file load is uniquely identified. To do this we have added Validfrom and ValidTo columns to the table. The Bulk Insert now uses a format file to skip these columns as they are not in the source file.

The whole process is

Update the current records which have a dummy end date (31st Dec 2099) with today's date.

Bulk insert the file with the ValidFrom and ValidTo dates being NULL.

Update the new records with ValidFrom = todays date and ValidTo = dummy date.

I have an index built (Unique but not a PK and not clustered) PolicyID (ASC),ValidFrom(ASC),ValidTo(ASC)

The data load seems to be reasonably quick; But I think the index update is taking a huge performance hit. The file is about 750K records and the first update will be updating records at the end of the index pages, then the load will load with NULL values which I am assuming are going into the front of the index, then being rewritten to go to the end of the index. It this a reasonable assumption and if so,

how do I measure it whilst the script is running, and is it worth turning off the index whist doing the load and then rebuilding it. The data will be loaded on a weekly basis and I need to replicate the process across about 20 files of differing sizes.

I will normally only be reporting on records from the most current insert, looking for a specific policyID, so should I change the order or direction of the index fields: PolicyID and either ValidFrom or ValidTo should be enough to give me a unique key. Is it worth dropping either of the fields from the index.
GilaMonster
GilaMonster
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47185 Visits: 44356
It's quite common to drop or disable nonclustered indexes before a data load and rebuild or recreate them afterwards.


Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Are the ValidFrom and ValidTo Date or DateTime, as you couldput default constraints on them.

That way you dont have to run the update after the load and only have to rebuild the index after the load is complete.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
aaron.reese
aaron.reese
Mr or Mrs. 500
Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)

Group: General Forum Members
Points: 589 Visits: 898
Thanks Jason, I am going to investigate that option a bit further. It seems to make a lot of sense as it involved only hitting the record once (when it is created)

My concern is that all 20 files need to load with the same FROM date even if the load runs over midnight (I have no idea when it will be run or how long it will take on the production server) and although I can use standard CASE WHEN THEN END logic to roll the date back if we go over midnight, I really want to be able to set the default from a variable value (which you obviously can't do because then it wouldnt be a default :w00tSmile

The Bulk Insert script is called 20 times from an SSIS package and the From Date would be the package start date
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
If it can load over midnight then you might have a problem.

Just a thought but cant you alter the default constraint at the start of each load process so that rather than doing a GetDate() it uses a fixed date for that batch.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
aaron.reese
aaron.reese
Mr or Mrs. 500
Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)Mr or Mrs. 500 (589 reputation)

Group: General Forum Members
Points: 589 Visits: 898
@Jason

I don't think so, the default has to be set as part of the DDL for the table and you can't use variables in that situation.

The SSIS package calls a stored procedure which builds a dynamic Bulk Insert script which is then executed with exec sp_executesql() I do this because the date formats in the source file are DMY which you can set in t-SQL but not with the Bulk Insert Task in SSIS. I opted not to go down the Data Transfomation task route as I have already built the format files and the only transform I need to do is on the date; it didn't seem worth it to confuse future maintainers (I am a contrator) with having to remap the SSIS tasks each time the data changes - much easier to edit the BCP format files.
Jason-299789
Jason-299789
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1132 Visits: 3229
Fair point Aaron, it would mean dipping into Dynamic SQL to inject the value into the Alter table script.

_________________________________________________________________________
SSC Guide to Posting and Best Practices
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search