SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


mapping an Excel Spreadsheet to a Table


mapping an Excel Spreadsheet to a Table

Author
Message
etl2016
etl2016
Say Hey Kid
Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)Say Hey Kid (662 reputation)

Group: General Forum Members
Points: 662 Visits: 281

Hi

I have a client who will send me an Excel spreadsheet and I have to load that to a table.

1) This excel spreadsheet will be sent only 2 or 3 times an year, not daily.

2) It contains under 5k records.

3) This table will be used as a Reference Lookup table by those programs which need it
4) This table needs to maintain history of changes, should there be any.

Below is the model I am going to design, please share your thoughts if anything can be added to make it more efficient.


1) Design it as SCD-2, with current-ness identifier column


INSERTs
---------
2) Nominate best possible Excel column as Natural Key, and rest of the columns are used for Change Detection

3) Any NK that is present in the Excel but not in my table will be used for INSERTs


UPDATEs
----------
4) Any row in Excel with its NK that is ALREADY present in my table, will be recognised as an UPDATE, if any of the non-NK columns has changed in comparison with what is in my table

PK-FK Referential Integrity
------------------------------
5) Since the scope of this table is a mere Reference Lookup table, it is NOT functionally related to any other tables (other than for lookup purposes). So, Foreign Key relations can be ignored.

DATA QUALITY
----------------
6) Since this is a hand-written Excel spreadsheet, there is high chance for data quality issues, am getting a consensus from source about the possibility of errors and NULLs/Blanks. Am going to use CHECK constraints to emphasize Data Quality while loading to table.

Please share your thoughts adding-to or correcting the above list.


thank you


Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)SSC Guru (673K reputation)

Group: General Forum Members
Points: 673293 Visits: 45556

etl2016 - Sunday, November 19, 2017 3:31 PM

Hi

I have a client who will send me an Excel spreadsheet and I have to load that to a table.

1) This excel spreadsheet will be sent only 2 or 3 times an year, not daily.

2) It contains under 5k records.

3) This table will be used as a Reference Lookup table by those programs which need it
4) This table needs to maintain history of changes, should there be any.

Below is the model I am going to design, please share your thoughts if anything can be added to make it more efficient.


1) Design it as SCD-2, with current-ness identifier column


INSERTs
---------
2) Nominate best possible Excel column as Natural Key, and rest of the columns are used for Change Detection

3) Any NK that is present in the Excel but not in my table will be used for INSERTs


UPDATEs
----------
4) Any row in Excel with its NK that is ALREADY present in my table, will be recognised as an UPDATE, if any of the non-NK columns has changed in comparison with what is in my table

PK-FK Referential Integrity
------------------------------
5) Since the scope of this table is a mere Reference Lookup table, it is NOT functionally related to any other tables (other than for lookup purposes). So, Foreign Key relations can be ignored.

DATA QUALITY
----------------
6) Since this is a hand-written Excel spreadsheet, there is high chance for data quality issues, am getting a consensus from source about the possibility of errors and NULLs/Blanks. Am going to use CHECK constraints to emphasize Data Quality while loading to table.

Please share your thoughts adding-to or correcting the above list.


thank you



If you're going to used TYPE 2 SCDs, the DON'T use a "current-ness identifier column". You should have two columns to identify the start and end dates of when the row was valid. The start date should be when the row (identified by some key) is first inserted. The end date should be the starting date of when the row next appears in the spreadsheet OR it should be CONVERT(DATETIME,''9999') for the "current" (latest) row so you don't have to muck around with double-checks in the criteria you'll eventually need to write against the table.

Obviously, when you end up with a new row for a given key, you'll need either for the loading process to find and update the end dates of existing rows or a trigger to do the same. Despite the coming objections of some of my peers, I recommend the trigger method so that someone else can't screw things up for you. Done correctly, it will be as fast or faster than a separate chunk of code in your import process.

As for data quality, I recommend NEVER loading data from a spreadsheet directly into the final table. Always use a staging table to pre-validate the data.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search