Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

SSIS Lookup using a Script Component Expand / Collapse
Author
Message
Posted Monday, December 29, 2008 10:31 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 3:57 AM
Points: 164, Visits: 196
Comments posted to this topic are about the item SSIS Lookup using a Script Component




Regards,
Sachin Dedhia
Post #627016
Posted Monday, December 29, 2008 10:47 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, November 27, 2012 12:43 PM
Points: 12, Visits: 35
Hello Sachin.

Sounds like your solution works very well.

Did you consider creating a transformation where you transform the multiple columns into say two or three columns, so you would have just one lookup component? I believe the advantage of this approach would be that you would think you would be able to process lookups against a larger set of values by letting SSIS take care of the caching, or by turning off the caching option in SSIS.

Your original table structure would be something like:

PK course_1, course_2, course_3, ...
1, a, b, c

Your new table structure would end up being something like:

PK, column_name, column_value
1, course_1, a
1, course_2, b
1, course_3, c

You could easily process the data in this format and provide the desired output into your two tables. I would be curious to know if this approach ends up being any faster or slower in your case.

If you don't know how to perform that kind of a transformation, I will be happy to point you in the correct direction.

Thanks

Rob


Post #627019
Posted Monday, December 29, 2008 11:57 PM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Wednesday, September 10, 2014 3:19 AM
Points: 5,375, Visits: 1,391
Nice article with good explanation...


Post #627045
Posted Monday, December 29, 2008 11:59 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:28 PM
Points: 35,263, Visits: 31,750
I'm curious... how many rows in the Course Reference table and any chance of you attaching the first 10 rows of the 10 million row file you imported? I'd like to play... :)

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #627047
Posted Tuesday, December 30, 2008 12:40 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 3:57 AM
Points: 164, Visits: 196
Hello Rob,

If I understand it right, you mean Pivot the CourseID columns and perform the lookup operation. If so, I would have to Unpivot it again as my destination table is not normalized. Also, the input would have many more fields like Name, Age, etc...., the CourseID columns may or may not have a value, may be in any upper case or lower case. Moreover, I needed to build a unique list of the values missing in the reference table which I could do in the same script component as well. And, all this coupled with the huge volume of 10+ million rows.

That said, the approach you mentioned is definitely a good try.





Regards,
Sachin Dedhia
Post #627076
Posted Tuesday, December 30, 2008 1:20 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, April 28, 2014 3:47 AM
Points: 9, Visits: 108
I find this related article very interesting:
http://blogs.msdn.com/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx


It describes the performace differences from Lookup, Merge, and Script Component used to solve the same logic. Very cool reference.
Post #627110
Posted Tuesday, December 30, 2008 2:19 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, November 27, 2012 12:43 PM
Points: 12, Visits: 35
Hi Sachin.

Yes, pivot and unpivot.. This would might make your script component contain less code in the case where there are 20 columns. The, you could lookup or merge and split, like in the article KH suggested. Perhaps a novice SSIS user with no coding experience could modify the pivot to pivot a few more columns easier than editing the script component... Perhpaps not.. But, since you have 10-15m rows.. the performance gain given by the script task will likely outweigh any need for simplicity in maintaining the SSIS package. Alternatively, if you had many different file formats to work with, one could adapt just the first component to the differing schemas instead of adapting the code which is three transformations into the package..

The unique list of values that didn't match could be created with a distinct sort against the rows where the lookup failed... as long as too many rows didn't actually fail. I've experienced performance issues with the distinct sort along with anyone else who has tried to use it.

Please note you can greatly reduce the amount of memory used by the lookup in many cases simply by specifying exactly what columns you'll actually be using in the lookup using a select statement instead of specifying the whole table with the drop down. I doubt your production server will have nearly as much free memory as your workstation.

In the end, the number of rows in the course reference table will dictate if you can continue to load all the rows into your datatable or not. It is likely this table is very small. How many rows do you have there?

Thanks

Rob


Rob

Post #627142
Posted Tuesday, December 30, 2008 4:34 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:28 PM
Points: 35,263, Visits: 31,750
Like I said, I'm curious... how many rows in the Course Reference table and any chance of you attaching the first 10 rows of the 10 million row file you imported? I'd like to play...


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #627212
Posted Tuesday, December 30, 2008 5:42 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, September 24, 2014 3:57 AM
Points: 164, Visits: 196
Jeff,
I can't get you the exact sample of what I have, but I'll try and get you something near to it soon.

BTW, the reference dataset used for lookup is filtered in my case and varies from ~15K rows in one scenario to ~50K rows in another. The table can have much more data.








Regards,
Sachin Dedhia
Post #627264
Posted Tuesday, December 30, 2008 5:45 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:28 PM
Points: 35,263, Visits: 31,750
Thanks... guess the rowcounts you gave will do just fine. Don't worry too much about the data... I guess I can gin something up.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #627267
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse