Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Is derived column better for performance of a package ? Expand / Collapse
Author
Message
Posted Saturday, November 10, 2012 10:32 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 19, 2013 4:30 AM
Points: 78, Visits: 195
I have a scenario where an excel have thousands of records with say 20 columns. In which their is a column which I need to split while inserting it into main table and also need to perform other operations as well.
I am using a staging table to import the data from excel and then performing rest of operations on the staging table before inserting the records to main table.
So what should be better approach from performance point of view - derived columns transformation or should I split it in a stored proc which is dealing with other operations as well.

Please advise.
Post #1383383
Posted Sunday, November 11, 2012 7:40 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 19, 2013 4:30 AM
Points: 78, Visits: 195
rhythm.varshney (11/10/2012)
I have a scenario where an excel have thousands of records with say 20 columns. In which their is a column which I need to split while inserting it into main table and also need to perform other operations as well.
I am using a staging table to import the data from excel and then performing rest of operations on the staging table before inserting the records to main table.
So what should be better approach from performance point of view - derived columns transformation or should I split it in a stored proc which is dealing with other operations as well.

Please advise.


No advise
Need more information or ???
Post #1383490
Posted Sunday, November 11, 2012 9:25 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 3, 2014 11:45 PM
Points: 24, Visits: 152
So you are importing the column to your staging table in it's native format and then splitting it from there? On what basis are you splitting it (i.e. at a certain column #, cerain character, etc.)? We need some more info to answer but in general I would think SSIS will be better at this kind of operation than T-SQL.
Post #1383491
Posted Monday, November 12, 2012 1:00 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 8:15 AM
Points: 13,017, Visits: 10,800
It depends on the operations as Randy mentioned - it needs to be possible in the SSIS expression language - but if it can, SSIS will be better, as you will do everything in one step. With the staging table, you do everything in two steps so that will probably be more time consuming.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1383535
Posted Monday, November 12, 2012 10:45 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 19, 2013 4:30 AM
Points: 78, Visits: 195
Randy Knight (11/11/2012)
So you are importing the column to your staging table in it's native format and then splitting it from there? On what basis are you splitting it (i.e. at a certain column #, cerain character, etc.)? We need some more info to answer but in general I would think SSIS will be better at this kind of operation than T-SQL.


yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??
Post #1383778
Posted Monday, November 12, 2012 1:14 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 8:15 AM
Points: 13,017, Visits: 10,800
Yes. Such a split is easy to do in the derived column and it saves you the staging step.



How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1383845
Posted Monday, November 12, 2012 2:27 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 3, 2014 11:45 PM
Points: 24, Visits: 152
rhythm.varshney (11/12/2012)

yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??


Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.

Post #1383878
Posted Monday, November 12, 2012 2:34 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 1:46 PM
Points: 5,384, Visits: 7,458
Randy Knight (11/12/2012)
rhythm.varshney (11/12/2012)

yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??


Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.



Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1383882
Posted Monday, November 12, 2012 2:37 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, September 3, 2014 11:45 PM
Points: 24, Visits: 152
Evil Kraig F (11/12/2012)

Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.


No disagreement here. In fact, one of the things I run into a lot is BI developers who forget about using the strengths of the relational engine (or more likely don't know them). You know what they say about generalizations ...
Post #1383885
Posted Tuesday, November 13, 2012 6:46 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, September 19, 2013 4:30 AM
Points: 78, Visits: 195
Evil Kraig F (11/12/2012)
Randy Knight (11/12/2012)
rhythm.varshney (11/12/2012)

yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two.
so you meant too say derived column would be better approach ??


Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.



Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.

However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.


Yes, actually if you see my comments I mentioned that I am performing UPINSERT logic on the basis of staging table which hold around 50 K records daily basis.So was thinking to handle the split on the same.As was having the staging table and did not want to insert derived columns additionally.But not sure about the performance.
Post #1384364
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse