• Eirikur Eiriksson (2/16/2014)


    Would this be of an interest?

    Data de-duplication using 2012 window functions.

    Simple cases;

    1. A set with no surrogate key defined

    2. A set with a surrogate key, related records must be updated

    Complex cases;

    1. Retaining the first value by any chosen sequence

    2. Retaining the first good value by any chosen sequence and pattern

    3. Retaining the last value by any chosen sequence

    4. Retaining the last good value by any chosen sequence and pattern

    5. Retaining the most dense value by any chosen pattern

    6. Retaining the least dense value by any chosen pattern

    Business rule application (possibly a separate article);

    1. Rule controlled matching

    2. Rule controlled de-duplication

    I think the simple cases are covered.

    It would be good to see an article that deletes the first or last value by some sequence. A separate piece could deal with having a pattern.

    I hadn't thought about density, but there's an article in there as well.