Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Is derived column better for performance of a package ?


Is derived column better for performance of a package ?

Author
Message
rhythm.varshney
rhythm.varshney
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 195
I have a scenario where an excel have thousands of records with say 20 columns. In which their is a column which I need to split while inserting it into main table and also need to perform other operations as well.
I am using a staging table to import the data from excel and then performing rest of operations on the staging table before inserting the records to main table.
So what should be better approach from performance point of view - derived columns transformation or should I split it in a stored proc which is dealing with other operations as well.

Please advise.
rhythm.varshney
rhythm.varshney
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 195
rhythm.varshney (11/10/2012)
I have a scenario where an excel have thousands of records with say 20 columns. In which their is a column which I need to split while inserting it into main table and also need to perform other operations as well.
I am using a staging table to import the data from excel and then performing rest of operations on the staging table before inserting the records to main table.
So what should be better approach from performance point of view - derived columns transformation or should I split it in a stored proc which is dealing with other operations as well.

Please advise.


No advise Sad
Need more information or ???
Randy Knight
Randy Knight
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 155
So you are importing the column to your staging table in it's native format and then splitting it from there? On what basis are you splitting it (i.e. at a certain column #, cerain character, etc.)? We need some more info to answer but in general I would think SSIS will be better at this kind of operation than T-SQL.
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16447 Visits: 13202
It depends on the operations as Randy mentioned - it needs to be possible in the SSIS expression language - but if it can, SSIS will be better, as you will do everything in one step. With the staging table, you do everything in two steps so that will probably be more time consuming.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
rhythm.varshney
rhythm.varshney
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 195
Randy Knight (11/11/2012)
So you are importing the column to your staging table in it's native format and then splitting it from there? On what basis are you splitting it (i.e. at a certain column #, cerain character, etc.)? We need some more info to answer but in general I would think SSIS will be better at this kind of operation than T-SQL.


yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??
Koen Verbeeck
Koen Verbeeck
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16447 Visits: 13202
Yes. Such a split is easy to do in the derived column and it saves you the staging step.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at SQLKover.

MCSA SQL Server 2012 - MCSE Business Intelligence
Randy Knight
Randy Knight
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 155
rhythm.varshney (11/12/2012)

yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??


Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.
Evil Kraig F
Evil Kraig F
SSCertifiable
SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)

Group: General Forum Members
Points: 5697 Visits: 7660
Randy Knight (11/12/2012)
rhythm.varshney (11/12/2012)

yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??


Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.



Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Randy Knight
Randy Knight
Grasshopper
Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)Grasshopper (24 reputation)

Group: General Forum Members
Points: 24 Visits: 155
Evil Kraig F (11/12/2012)

Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.


No disagreement here. In fact, one of the things I run into a lot is BI developers who forget about using the strengths of the relational engine (or more likely don't know them). You know what they say about generalizations ...
rhythm.varshney
rhythm.varshney
SSC Journeyman
SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)SSC Journeyman (88 reputation)

Group: General Forum Members
Points: 88 Visits: 195
Evil Kraig F (11/12/2012)
Randy Knight (11/12/2012)
rhythm.varshney (11/12/2012)

yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??


Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.



Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.


Yes, actually if you see my comments I mentioned that I am performing UPINSERT logic on the basis of staging table which hold around 50 K records daily basis.So was thinking to handle the split on the same.As was having the staging table and did not want to insert derived columns additionally.But not sure about the performance.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search