﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Discuss Content Posted by Sachin Dedhiya / Article Discussions / Article Discussions by Author  / SSIS Lookup using a Script Component / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sat, 25 May 2013 18:22:40 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Great article. Very nicely explained and a great concept.Rob.</description><pubDate>Thu, 08 Jan 2009 10:34:35 GMT</pubDate><dc:creator>robertm</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hi SachinAs requested, here is an example, but first, just a bit of waffling...In the code, I create a dummy temp table, and put some random values in. My last column however is a calculated column, using checksum, which generates a key to do the joining on. It does not have to be a checksum - you can use whatever logic you like to build a key (Checksum has some flaws,in that there are cases where it may actually returns duplicates). Viz, in the example below, you could just as well have concatenated the strings, instead of using checksum. The code doesn't have to be this simple either - you can write update statements to rather generate a key if the  logic is quite complex. The main thing is, you must just be sure that the values you are using to create a key with, are unique. That way, you will always just have to join on one column, instead of many.Here's the code (looks a bit better if you copy it into SSMS):   /* Check if the temp table already exists*/if OBJECT_ID('tempdb..#temp') is not null	Drop table #Temp   /* Create a temp table */	Create table #Temp(	RowID		int identity (1,1)	,Comment	varchar(200) NULL	,Column1	varchar(200) NULL  	,Column2	varchar(200) NULL  	,Column3	varchar(200) NULL  	,Key_Column	as Checksum(Column1,  Column2 , Column3))go   /* Insert Random Values */Insert into #Temp(	Comment	,Column1) Select 	'Show example of value for column 1 only'	,'Some Value'	goInsert into #Temp(	Comment	,Column1) Select 	'Show example of another value for column 1 only'	,'Some other value'	goInsert into #Temp(	Comment	,Column3) Select 	'Show example of value for column 3 only'	,'Some value'	goInsert into #Temp(	Comment	,Column1	,Column2	,Column3 ) Select	'Show example of value for all columns only'	,'Some Value'	,'Another Value'	,'Yet Another Value'	go   /* Get the data */Select		*from		#Temp</description><pubDate>Mon, 05 Jan 2009 23:47:44 GMT</pubDate><dc:creator>Zanoni Labuschagne-766625</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>[quote][b]Sachin Dedhia (1/5/2009)[/b][hr]Not sure why the approach is RBAR approach? In almost all cases, ETL stuff would involve transformation that work on row level data.[/quote]Being an RBAR code process is strictly separate from the fact that this may or may not be your fastest execution path for your given needs.  I was not speaking to speed, but rather flexibility and re-useability.  Your code component is fast, but not dynamic.An ETL plan doesn't have to process via SSIS one row at a time, nor one column at a time, especially if you are dealing with something that can and may change (such as number of course columns).  It is usually more prudent to seek instead for an approach where one does not have to modify code and re-deploy to the package server to process the next file, or worse yet, the nasty scenario of having a different package for each source when all are doing the same type of task.If your incoming format will remain stable (a rare thing indeed), then it likely doesn't matter. However, if it can change, then ETL-ing into a database, and using the SQL engine to process the matches by making SSIS call the appropriate procedures, etc., is far more flexible, and still stays within the realm of SSIS ETL.I have also learned (the hard way) that the data is better kept for other purposes.  Rare indeed are the projects I've come accross that serve an "enrichment" or "counts" only purpose (one example of enrichment is with data cleansing projects against a third-party tool, during importing and normalizing incoming lead data).Your case may be just such, however, so your mileage may vary.  But in principle, the approach as it stands has custom code, making it flex only by brute code force, even for simple changes such as more (or less) columns of the same type, giving it a high probability for failure over time.Output (usually the fastest step in the process) doesn't preclude one appending the current file, writing a new one in-process, or even an SSIS task that stuffs data from the database to a file.I have to admit that dumping en-masse notes about all the courses in the row makes me shudder and question the process as a whole.I do think the article is useful however on several points, as your article was about the use of the component, and not necessarily the particular data process.</description><pubDate>Mon, 05 Jan 2009 09:24:35 GMT</pubDate><dc:creator>DPhillips-731960</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hi dphillips,I appreciate your observation on the "nomenclature"! And as far as the article is concerned, thanks to Steve for his valuable comments.Not sure why the approach is RBAR approach? In almost all cases, ETL stuff would involve transformation that work on row level data.While I agree with you on having a normalized database structure, my requirement does not require much to do with the normalized data. If normalized, I would have to again de-normalize the data for generating a file as a feed to another application.</description><pubDate>Mon, 05 Jan 2009 08:16:10 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Zanonil, Can you elaborate more on CHECKSUM? An example would help.</description><pubDate>Mon, 05 Jan 2009 07:57:55 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hi SimonI am not a .NET guy, so there may be better approach for caching the reference data like Dictionary object as you said.You may want to refer to [url]http://www.eggheadcafe.com/articles/20030830.asp[/url] for some inputs on Find method.HTH!</description><pubDate>Mon, 05 Jan 2009 07:52:23 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hi SachinBeing a DW developer, I am always on the lookout for the simplest way of doing things. Another suggestion/question would be (if I understand the scenario correctly), why not add an intelligent key column... something like checksum(col1 + col2 + col3... + col[i]n[/i]). Then, just add the same key logic to your source data query, and in the look-up to compare one set of keys? That way you are passing the logic out of SSIS, and to T-SQL, where the queries will be more malleable and agile.Great article though... Please post more of these "out-of-the-box" articles!!Thanks.</description><pubDate>Mon, 05 Jan 2009 07:26:16 GMT</pubDate><dc:creator>Zanoni Labuschagne-766625</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Its a real shame the script component doesn't expose a columns collection for the Row object. That way you could iterate over the columns very easily and not have to repeat the same code.You could do that in a custom component but thats another level of work.This is a great solution, I would question the performance of the Find method for the Data Table.You might find it quicker to populate a Dictionary object and do the lookup that way.</description><pubDate>Sat, 03 Jan 2009 08:42:53 GMT</pubDate><dc:creator>Simon Sabin</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Nicely illustrated article, and stepped explanations.  Easy to follow.  It also demonstrates nomenclature and useage of script components.  There are little trinkets of caveats that one must know to succesfully navigate the script component, and it appears you have worked it out.This approach is an RBAR approach... even more so, not just row by row, but field by field in each row.  As such would not be very flexible for change in column count.Having dealt with this type of data many times in the education industry, there are often multiple course selections.  In practice it has always served better, faster, and more flexible for future use/re-use of both the component and the data to normalize the data, no matter what the final output format may be.  In a normalized format, it matters not if there are 1 or 100 CourseID columns, nor the size of the comparison set.</description><pubDate>Tue, 30 Dec 2008 16:30:11 GMT</pubDate><dc:creator>DPhillips-731960</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Ah hah.. well in the case of a missing PK you could make a surrogate key representing the import job / filename, etc...  Your solution is ovbiously working, and performs.  Like I said in the beginning, I am curious to know how my approach would perform since I believed in the beginning that it would use less memory from the lookup, but I realize now after the pivot that my solution would use more memory per column.. I'll have to do some reading about the SSIS memory "back-pressure" controls I heard about at an SQL meeting.. I think that would help keep the memory needed for the entire task to a minimum.</description><pubDate>Tue, 30 Dec 2008 09:42:50 GMT</pubDate><dc:creator>Robert C. Wafle</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hi Rob,The data I have does not have a PK. It is basically a dump for a year from many locations and used for analysis on the avaliable fields.While the missing list is expected to have hardly anything, there can be a case where the reference data may be missing completely making the Sort component (and may be the server as well) go for a toss. Moreover, if even one of the values from a row (for CourseID) is missing, the entire record should be discarded. At the same time, other fields (non CourseID fields) should still be validated and all the failed validations should be consolidated in a comments field for the user.Lastly, the reference dataset I use is filtered and has ~15K rows in one scenario and ~50K rows in another. The underlying table may have much more data.</description><pubDate>Tue, 30 Dec 2008 06:17:18 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>[quote][b]kh (12/30/2008)[/b][hr]I find this related article very interesting:http://blogs.msdn.com/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspxIt describes the performace differences from Lookup, Merge, and Script Component used to solve the same logic. Very cool reference.[/quote]Indeed a good reference. I am surprised that I never came across it while googling around for SSIS lookup.</description><pubDate>Tue, 30 Dec 2008 05:55:22 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Thanks... guess the rowcounts you gave will do just fine.  Don't worry too much about the data... I guess I can gin something up.</description><pubDate>Tue, 30 Dec 2008 05:45:48 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Jeff,I can't get you the exact sample of what I have, but I'll try and get you something near to it soon.BTW, the reference dataset used for lookup is filtered in my case and varies from ~15K rows in one scenario to ~50K rows in another. The table can have much more data.</description><pubDate>Tue, 30 Dec 2008 05:42:01 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Like I said, I'm curious... how many rows in the Course Reference table and any chance of you attaching the first 10 rows of the 10 million row file you imported?  I'd like to play...   </description><pubDate>Tue, 30 Dec 2008 04:34:38 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hi Sachin.Yes, pivot and unpivot..  This would might make your script component contain less code in the case where there are 20 columns.  The, you could lookup or merge and split, like in the article KH suggested.  Perhaps a novice SSIS user with no coding experience could modify the pivot to pivot a few more columns easier than editing the script component... Perhpaps not..   But, since you have 10-15m rows.. the performance gain given by the script task will likely outweigh any need for simplicity in maintaining the SSIS package.  Alternatively, if you had many different file formats to work with, one could adapt just the first component to the differing schemas instead of adapting the code which is three transformations into the package..  The unique list of values that didn't match could be created with a distinct sort against the rows where the lookup failed... as long as too many rows didn't actually fail.  I've experienced performance issues with the distinct sort along with anyone else who has tried to use it.Please note you can greatly reduce the amount of memory used by the lookup in many cases simply by specifying exactly what columns you'll actually be using in the lookup using a select statement instead of specifying the whole table with the drop down.  I doubt your production server will have nearly as much free memory as your workstation.In the end, the number of rows in the course reference table will dictate if you can continue to load all the rows into your datatable or not.  It is likely this table is very small.  How many rows do you have there? ThanksRobRob</description><pubDate>Tue, 30 Dec 2008 02:19:50 GMT</pubDate><dc:creator>Robert C. Wafle</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>I find this related article very interesting:http://blogs.msdn.com/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspxIt describes the performace differences from Lookup, Merge, and Script Component used to solve the same logic. Very cool reference.</description><pubDate>Tue, 30 Dec 2008 01:20:33 GMT</pubDate><dc:creator>kh-619447</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hello Rob,If I understand it right, you mean Pivot the CourseID columns and perform the lookup operation. If so, I would have to Unpivot it again as my destination table is not normalized. Also, the input would have many more fields like Name, Age, etc...., the CourseID columns may or may not have a value, may be in any upper case or lower case. Moreover, I needed to build a unique list of the values missing in the reference table which I could do in the same script component as well. And, all this coupled with the huge volume of 10+ million rows.That said, the approach you mentioned is definitely a good try.</description><pubDate>Tue, 30 Dec 2008 00:40:44 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>I'm curious... how many rows in the Course Reference table and any chance of you attaching the first 10 rows of the 10 million row file you imported?  I'd like to play... :)</description><pubDate>Mon, 29 Dec 2008 23:59:03 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Nice article with good explanation...</description><pubDate>Mon, 29 Dec 2008 23:57:44 GMT</pubDate><dc:creator>Anipaul</dc:creator></item><item><title>RE: SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Hello Sachin.Sounds like your solution works very well.Did you consider creating a transformation where you transform the multiple columns into say two or three columns, so you would have just one lookup component?  I believe the advantage of this approach would be that you would think you would be able to process lookups against a larger set of values by letting SSIS take care of the caching, or by turning off the caching option in SSIS. Your original table structure would be something like:PK  course_1, course_2, course_3, ...1, a, b, cYour new table structure would end up being something like:PK, column_name, column_value1, course_1, a1, course_2, b1, course_3, cYou could easily process the data in this format and provide the desired output into your two tables.  I would be curious to know if this approach ends up being any faster or slower in your case.If you don't know how to perform that kind of a transformation, I will be happy to point you in the correct direction.  ThanksRob</description><pubDate>Mon, 29 Dec 2008 22:47:01 GMT</pubDate><dc:creator>Robert C. Wafle</dc:creator></item><item><title>SSIS Lookup using a Script Component</title><link>http://www.sqlservercentral.com/Forums/Topic627016-298-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/SSIS/64766/"&gt;SSIS Lookup using a Script Component&lt;/A&gt;[/B]</description><pubDate>Mon, 29 Dec 2008 22:31:21 GMT</pubDate><dc:creator>Sachin Dedhia</dc:creator></item></channel></rss>