creating dummy data that requires fuzzy matching transform in SSIS

  • Why YES, I have read Jeff Moden's article... but this question is a bit different.
    I'm trying to figure out how to basically mess up some data, but only a little bit (mostly).  I have a table of symptoms from a database, but I'm trying to imitate the case where some are misspelled, etc. In SSIS, you'd split the flow so the records that match the lookup table go in one flow and the non-matching go to a Fuzzy Matching algorithm and then you try the match again...
    Returning random records is no big deal.  Would I use REPLACE with a random character? or swap two characters?
    Thanks!

  • What this sounds like it boils down to is:  TEST the fuzzy matching algorithm.   I'd be rather scientific about that kind of thing.  Be very specific about what you test, and come up with a specific set of variations on a given theme, and test only that theme and see what you get.   Then do that theme by theme for each one you come up with.   Have a number of people with different perspectives analyze each theme for applicability, meaning, is it a likely real-world occurring variation on normal?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply