SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SSIS Lookup using a Script Component


SSIS Lookup using a Script Component

Author
Message
Sachin Dedhia
Sachin Dedhia
Right there with Babe
Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)

Group: General Forum Members
Points: 734 Visits: 218
Comments posted to this topic are about the item SSIS Lookup using a Script Component




Regards,
Sachin Dedhia
Robert C. Wafle
Robert C. Wafle
SSC Journeyman
SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)

Group: General Forum Members
Points: 98 Visits: 35
Hello Sachin.

Sounds like your solution works very well.

Did you consider creating a transformation where you transform the multiple columns into say two or three columns, so you would have just one lookup component? I believe the advantage of this approach would be that you would think you would be able to process lookups against a larger set of values by letting SSIS take care of the caching, or by turning off the caching option in SSIS.

Your original table structure would be something like:

PK course_1, course_2, course_3, ...
1, a, b, c

Your new table structure would end up being something like:

PK, column_name, column_value
1, course_1, a
1, course_2, b
1, course_3, c

You could easily process the data in this format and provide the desired output into your two tables. I would be curious to know if this approach ends up being any faster or slower in your case.

If you don't know how to perform that kind of a transformation, I will be happy to point you in the correct direction.

Thanks

Rob
Anipaul
Anipaul
SSChampion
SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)SSChampion (11K reputation)

Group: General Forum Members
Points: 11883 Visits: 1407
Nice article with good explanation...



Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)

Group: General Forum Members
Points: 340213 Visits: 42634
I'm curious... how many rows in the Course Reference table and any chance of you attaching the first 10 rows of the 10 million row file you imported? I'd like to play... Smile

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Sachin Dedhia
Sachin Dedhia
Right there with Babe
Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)

Group: General Forum Members
Points: 734 Visits: 218
Hello Rob,

If I understand it right, you mean Pivot the CourseID columns and perform the lookup operation. If so, I would have to Unpivot it again as my destination table is not normalized. Also, the input would have many more fields like Name, Age, etc...., the CourseID columns may or may not have a value, may be in any upper case or lower case. Moreover, I needed to build a unique list of the values missing in the reference table which I could do in the same script component as well. And, all this coupled with the huge volume of 10+ million rows.

That said, the approach you mentioned is definitely a good try.




Regards,
Sachin Dedhia
kh-619447
kh-619447
Valued Member
Valued Member (54 reputation)Valued Member (54 reputation)Valued Member (54 reputation)Valued Member (54 reputation)Valued Member (54 reputation)Valued Member (54 reputation)Valued Member (54 reputation)Valued Member (54 reputation)

Group: General Forum Members
Points: 54 Visits: 119
I find this related article very interesting:
http://blogs.msdn.com/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx


It describes the performace differences from Lookup, Merge, and Script Component used to solve the same logic. Very cool reference.
Robert C. Wafle
Robert C. Wafle
SSC Journeyman
SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)SSC Journeyman (98 reputation)

Group: General Forum Members
Points: 98 Visits: 35
Hi Sachin.

Yes, pivot and unpivot.. This would might make your script component contain less code in the case where there are 20 columns. The, you could lookup or merge and split, like in the article KH suggested. Perhaps a novice SSIS user with no coding experience could modify the pivot to pivot a few more columns easier than editing the script component... Perhpaps not.. But, since you have 10-15m rows.. the performance gain given by the script task will likely outweigh any need for simplicity in maintaining the SSIS package. Alternatively, if you had many different file formats to work with, one could adapt just the first component to the differing schemas instead of adapting the code which is three transformations into the package..

The unique list of values that didn't match could be created with a distinct sort against the rows where the lookup failed... as long as too many rows didn't actually fail. I've experienced performance issues with the distinct sort along with anyone else who has tried to use it.

Please note you can greatly reduce the amount of memory used by the lookup in many cases simply by specifying exactly what columns you'll actually be using in the lookup using a select statement instead of specifying the whole table with the drop down. I doubt your production server will have nearly as much free memory as your workstation.

In the end, the number of rows in the course reference table will dictate if you can continue to load all the rows into your datatable or not. It is likely this table is very small. How many rows do you have there?

Thanks

Rob


Rob
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)

Group: General Forum Members
Points: 340213 Visits: 42634
Like I said, I'm curious... how many rows in the Course Reference table and any chance of you attaching the first 10 rows of the 10 million row file you imported? I'd like to play...

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Sachin Dedhia
Sachin Dedhia
Right there with Babe
Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)Right there with Babe (734 reputation)

Group: General Forum Members
Points: 734 Visits: 218
Jeff,
I can't get you the exact sample of what I have, but I'll try and get you something near to it soon.

BTW, the reference dataset used for lookup is filtered in my case and varies from ~15K rows in one scenario to ~50K rows in another. The table can have much more data.




Regards,
Sachin Dedhia
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)SSC Guru (340K reputation)

Group: General Forum Members
Points: 340213 Visits: 42634
Thanks... guess the rowcounts you gave will do just fine. Don't worry too much about the data... I guess I can gin something up.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search