I wouldn't expect many collisions with MD5 unless you're processing 2^64 or more rows. If you're getting collisions with MD5 or SHA I would immediately suspect the...
I've implemented a solution similar to this one, except my MD5 was originally calculated in an SSIS package. This makes it easier to scale the solution.