RE: Incremental load using Hashvalue

SSC Guru

Points: 100851

January 29, 2014 at 12:47 pm

I've done this before, also with the Persisted computed column for only relevant fields, though I usually use a key-checksum combination.

What I typically do is load the key/checksum combination into a lookup component, then feed in changes from the source determined by their delta detection (usually a datetime field and a 'last run' field in my configurations) and determine those that don't match for the combination. In a lookup, the unmatched error rows can be easily re-directed, and those are what I feed to my staging table. The matching rows are allowed to rot in dev/null. 🙂

After that, once they're in the staging structure, I do what Phil recommends and MERGE those I know are changed. It just eases up on the processing and since I have to move the data between source and target anyway, a bit of transformation along the way is far more lightweight to the computations than processing 2-3x the rows on the target.

Just keep the lookup tight. If you're past 30 bytes across or so, it's far too large for easy cache'ing on umpteen million row targets, at least on my equipment.

EDIT: I should mention that yes, I know checksum can have varchar continuation concerns, where 'abc','def' is the same value as 'ab','cdef'. If I can't split my varchars with integers I'll usually append a pipe in the calculation function, so:

CheckSumVal AS CHECKSUM( VCol1, IntCol1, VCol2, '|' + VCol3, '|' + VCol4)

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA