Question on creating new tables to handle versions of data entry.

  • Currently we collect data and every 3 to 9 months we have changes to the data we collect, usually new fields. What we have been doing is just adding columns and updating our data entry and save code to map t o the new columns and versioning off that code as well. Same for our SSRS reports.

    What I was wondering is instead of trying to shoehorn this into the same table is just versioning off the existing table each time and adding the new column.

    DataCollectionV2 (10 columns)

    DataCollectionV3 (15 columns, 9 the same as the V2 table)

    etc...

    I feel as if this is wrong to generate all these tables, but trying to just add new columns and then never using the old columns again seems off as well.

    Suggestions\thoughts?

    I have also been debating a more vertical design more like a WH fact table.

    Can anyone share any general thoughts/wisdom/personal stories?

    Thanks!

  • One possibility for this is to go into the Dace version of 6th normal form for columns affected by this, and use an anchor model. However, that may be overkill (I'm sure Chris Dave would say it's not; and I know i would say it probably is). You can get most of the benefits by moving attributes which are no longer used into separate tables, each new table having a bunch of attributes which all became obsolete at the same time together with the primary key attributes identifying what each row refers to, so that you don't need to have rows in the table for things which don't have values for its non-key columns - entries which use those attributes can be handled using joins from the main table to the relevant extra tables.

    Personally, I suspect that you would be better of splitting into tables that have complete sets and avoiding the extra joins, as you seem to be suggesting as one option, as that achieves much the same space saving and equally removes the need for having to devise values to put in these columns if the new items thenew columns are required for business reasons not to be nullable and you want to enforce that as a domain constraint to help ensure data integrity.

    It's probably not a good idea to keep all the old columns and all the new columns in one table as it restricts how much data integrity enforcement you can do using constraints. But if you don't need to do that sort of data integrity (something which is far more often believed that true, unfortunately) keeping all columns in the one table won't cause problems.

    Tom

  • That all makes good sense.

    I'll look into 6NF, but I think that may over-complicate things (as you suggested).

    The real goal here is just to store the data. Since new columns are needed and all the old ones retired each time we version change, it does seem that just versioning off the tables makes good sense.

    What makes things worse about the current approach is that sometimes (say two or three versions later) a discontinued column is brought back into use which just complicates tracking what was used for which version and when 🙂

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply