|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 4:31 PM
Points: 49,
Visits: 135
|
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Sunday, August 21, 2011 7:43 PM
Points: 2,
Visits: 17
|
|
| i like the hash approach - to save time you could just use the konesans checksum algorithm add in to calculate the crc value. Combined with the conditional split it works a treat.
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Wednesday, May 04, 2011 6:28 AM
Points: 10,
Visits: 17
|
|
| I implement something like this and it increased dramatically the SSIS performance. Fortunately i have in the input a hash column already computed, from sources
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Wednesday, April 24, 2013 12:41 AM
Points: 44,
Visits: 3,321
|
|
| Great! I haven't read this but it looks great. Will read this and place some comments (when i have them).
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Today @ 6:39 AM
Points: 9,376,
Visits: 6,472
|
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Today @ 6:37 AM
Points: 81,
Visits: 849
|
|
Really interesting and well-written article, thank you. As a matter of interest, is it any faster (or in any other way better or worse) to create the hash value as part of the T-SQL Select statement, using code based on something like this:
SELECT HashBytes('MD5', FirstName + LastName) FROM [AdventureWorks].[Person].[Contact]
Of course you will have to use all the fields that you will be hashing on, and cast all values to (n)varchar, but I wondered if you had tried this, and if so, which appraoch gave better performance?
|
|
|
|
|
SSC-Enthusiastic
      
Group: General Forum Members
Last Login: Wednesday, May 15, 2013 10:35 AM
Points: 136,
Visits: 327
|
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 4:31 PM
Points: 49,
Visits: 135
|
|
rcooke (6/7/2010) i like the hash approach - to save time you could just use the konesans checksum algorithm add in to calculate the crc value. Combined with the conditional split it works a treat.
Konesans Checksum is very prone to collisions, so it can introduce errors into your ETL process. Using MD5 hashes the chance for hash collisions is much much lower.
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 4:31 PM
Points: 49,
Visits: 135
|
|
da-zero (6/8/2010) Nice article. Although I would have used a temp/staging table to store my updates, instead of using the OLE DB Command. That way you can use a set based update instead of an update per row.
This is the way I normally do it as well. I wanted my example to focus more on the hashing this time around.
I heartily endorse doing anything you can do avoid using the OLEDB Command or SCD Task for updates.
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, May 10, 2013 4:31 PM
Points: 49,
Visits: 135
|
|
Adam Aspin (6/8/2010) Really interesting and well-written article, thank you. As a matter of interest, is it any faster (or in any other way better or worse) to create the hash value as part of the T-SQL Select statement, using code based on something like this:
SELECT HashBytes('MD5', FirstName + LastName) FROM [AdventureWorks].[Person].[Contact]
Of course you will have to use all the fields that you will be hashing on, and cast all values to (n)varchar, but I wondered if you had tried this, and if so, which appraoch gave better performance?
There shouldn't be much difference in doing it this way, however I try to avoid doing string manipulation in SQL whenever possible as that can be a major drag on performance. That was the chief reason in deciding to do it in .Net.
|
|
|
|