|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Thursday, March 21, 2013 2:14 AM
Points: 77,
Visits: 184
|
|
I have a scenario where an excel have thousands of records with say 20 columns. In which their is a column which I need to split while inserting it into main table and also need to perform other operations as well. I am using a staging table to import the data from excel and then performing rest of operations on the staging table before inserting the records to main table. So what should be better approach from performance point of view - derived columns transformation or should I split it in a stored proc which is dealing with other operations as well.
Please advise.
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Thursday, March 21, 2013 2:14 AM
Points: 77,
Visits: 184
|
|
rhythm.varshney (11/10/2012) I have a scenario where an excel have thousands of records with say 20 columns. In which their is a column which I need to split while inserting it into main table and also need to perform other operations as well. I am using a staging table to import the data from excel and then performing rest of operations on the staging table before inserting the records to main table. So what should be better approach from performance point of view - derived columns transformation or should I split it in a stored proc which is dealing with other operations as well.
Please advise.
No advise  Need more information or ???
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, June 12, 2013 10:53 AM
Points: 24,
Visits: 139
|
|
| So you are importing the column to your staging table in it's native format and then splitting it from there? On what basis are you splitting it (i.e. at a certain column #, cerain character, etc.)? We need some more info to answer but in general I would think SSIS will be better at this kind of operation than T-SQL.
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Yesterday @ 6:30 AM
Points: 9,410,
Visits: 6,495
|
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Thursday, March 21, 2013 2:14 AM
Points: 77,
Visits: 184
|
|
Randy Knight (11/11/2012) So you are importing the column to your staging table in it's native format and then splitting it from there? On what basis are you splitting it (i.e. at a certain column #, cerain character, etc.)? We need some more info to answer but in general I would think SSIS will be better at this kind of operation than T-SQL.
yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two. so you meant too say derived column would be better approach ??
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Yesterday @ 6:30 AM
Points: 9,410,
Visits: 6,495
|
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, June 12, 2013 10:53 AM
Points: 24,
Visits: 139
|
|
rhythm.varshney (11/12/2012)
yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two. so you meant too say derived column would be better approach ??
Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Yesterday @ 5:35 PM
Points: 5,722,
Visits: 6,194
|
|
Randy Knight (11/12/2012)
rhythm.varshney (11/12/2012)
yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two. so you meant too say derived column would be better approach ??Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster.
Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.
However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.
- Craig Farrell
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions | Forum Netiquette For index/tuning help, follow these directions. |Tally Tables Twitter: @AnyWayDBA
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, June 12, 2013 10:53 AM
Points: 24,
Visits: 139
|
|
Evil Kraig F (11/12/2012)
Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff.
However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.
No disagreement here. In fact, one of the things I run into a lot is BI developers who forget about using the strengths of the relational engine (or more likely don't know them). You know what they say about generalizations ...
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Thursday, March 21, 2013 2:14 AM
Points: 77,
Visits: 184
|
|
Evil Kraig F (11/12/2012)
Randy Knight (11/12/2012)
rhythm.varshney (11/12/2012)
yes.I am importing the data to staging table as it is and there I am applying split on a column say name into first name and last name on the basis of space between two. so you meant too say derived column would be better approach ??Yes. The SSIS expression language is going to be much easier to do this in. And while it is a bit of a paradigm shift, get away from the staging table idea. We want to be doing ETL (Extract, Tranform, Load) rather than ELT (Extract, Load, Transform). SSIS excels at doing these things in memory, in the pipeline. Writing to staging table(s) in the interim slows things down considerably. When things get overly complex for a single dataflow, I will often write to raw files for the interim step rather than a db table ... still much faster. Right up until you need to sort data for aggregations or the like, I agree with you. When you need to do that, T-SQL tends to blow the dataflow stream away. I agree with the general principal though. In a staging table you need to write at least twice (once in, once transformed), where if you transform before you write, it's a single dropoff. However, don't completely throw away the idea of a staging table. You cannot UPDATE or DELETE from an existing table from SSIS in bulk. If you're doing a MERGE equivalent and not merely delivering new rows to the existing table, deliver to staging and then MERGE it in. That's still part of the LOAD components of ETL though, so I may just be splitting hairs with what Randy was recommending.
Yes, actually if you see my comments I mentioned that I am performing UPINSERT logic on the basis of staging table which hold around 50 K records daily basis.So was thinking to handle the split on the same.As was having the staging table and did not want to insert derived columns additionally.But not sure about the performance.
|
|
|
|