Extract all fields VS what is needed at this time

  • I'm working on a data warehouse load where we will be extracting data from several source systems data repositories(DR). We are setting up extracts for each table from each DR and staging a day or couple days worth of data at a time in staging tables. These tables will be truncated before each extract, so we are not keeping the history. My thought is if I have a table that has 75 fields in it, I want to go ahead and set up the extract to pull all 75 fields, even though initially I may only use 25 of the fields for the new data warehouse. I know that we will eventually be pulling in more if not all of the fields into the warehouse. None of these 'extra' fields at this time are very big, mainly dates, int, some small char and varchar fields, no text or xml fields. We have already ran into cases where the extract was set up, to only pull fields needed at this time, for a given table, and then one project requires 5 more fields. So the extract is updated and the target stage table is updated. Then another requires 6 more, then another has 8 more. It just seems like a lot of extra work and a lot more chances for errors in continually updating the extract and table.

    Am I wrong in thinking it would make more sense to just bring in the extra fields each day even if they aren't being used? We could be talking 200K or more records for each pull, but that wouldn't be the case for every table. And the table will be truncated before each load each day.

    Will the reduced number of fields make that big a difference in the extract or load times? But way this Vs. cost of developer redoing for each new field?

    The stage tables will not contain any indexes. This will be done in SSIS Data flow with the 'fast load' option.

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

Viewing 0 posts

You must be logged in to reply to this topic. Login to reply