Data Validation - matching Own Data with Externally Sourced / Spidered

  • I have a table in my APP (the "APP" Table), it could be something like a VENDOR table with, let's say, ID, Company name, Address 1, 2, 3 ... Town, ZipCode/PostCode and Country

    Then I have some external source(s) for that data, and the ability to match their ID with my ID.

    I want to have a Verification Table (the "VERIFY" table), with (let's say) the same Columns as my APP table.

    I will populate the VERIFY Table from a Source (Inserts initially, and then via subsequent Updates). Any differences need to be categorised as:

    1. My data is wrong, fix it
    2. The difference is benign, but I don't want to change my data
    3. Maybe some other reason!

    #1 is easy, I can fix my data (including adding a "Fix this" button to SYNC the data using SQL, although I might want to manually fix my data where my change is not identical to the source). Either way, it would be good to know that the fix was initiated by the comparison (e.g. for management reporting). If someone presses the "Fix this" button then logging that is easy ... but if they do it manually I'd like the user to record that the Comparison was the reason, but I doubt users will bother to log it ... every time ... I may be able to figure that out from "Time a Record was viewed in the COMPARISON report, and Update Time on that Record" (particularly if the UPDATE of that record is also responsible for setting the DIFFERENCE flags on the VERIFY record - e.g. by Trigger)

    #2 I need to store a FLAG that the (current) version in VERIFY table is an "Acceptable difference"

    When #2 happens then I may, in future, get:

    • My data (for that Row/Column) changes, or the Verify data changes (at a future Import / Spider)

    I figure that I need:

    • Previously the difference was IGNORED, now the MY/VERIFY data has changed and is now MATCH/Differently-different

    My thoughts on the design of the VERIFY table are:

    Have a VERIFY table with identical columns to my APP table (to make it easier to maintain the VERIFY table when future maintenance adds / changes columns in the APP table)

    I also need a SOURCE to indicate where the verification data came from, also Create Date / Update Date, maybe Version / Batch No too.

    Then I need a FLAG for each data column. That to indicate whether Match / Different / Accepted (a bit pattern would also allow Previously-Different and now differently-different (needs Accepting again), and for the User to log that they Adopted that change (i.e. updated the APP data via SQL or Manually)

    I also need to know if that Column was actually available from that Source (NULL not much help to me here as the Source may provide the Column Value, but the Value might be NULL).

    I think it would be helpful if the VERIFY table had identical columns to the APP Table, so I'm thinking of having a separate 1:1 column with this extra data and the FLAGS. The ID of the VERIFY table would NOT match the APP Table (there might be multiple sources of data for a given record), so join VERIFY table to FLAGs table (on ID) and then the FLAGs table would have the actual ID of the APP record, also Source, Version, and the BIT columns for Match / Difference, Accepted-difference, Source-provided/not-provided and so on

    This is the first time I've needed to do this, maybe there is well trodden ground for how to do this? If you have any thoughts that would be appreciated, thanks.

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply