SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Most efficient way to extract data for a historical data table


Most efficient way to extract data for a historical data table

Author
Message
aaa121
aaa121
SSC-Addicted
SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)

Group: General Forum Members
Points: 495 Visits: 783
I have a 3rd normal form table which maintains history over time using a date row start and date row end.

I need to extract data incrementally from this table starting at 1900-01-01 and looping through each day. The easiest way is to provide the data and use it in a where clause like this:

WHERE '1900-01-01' between date row start and date end date



incrementing the date by a day then running the query again. However this would obviously pick up the same records time and time again. This isn't neccessary a issue as I will be merging these 'incremental' records using a column . It will only be a one off data load operation. But from a performance I would like to think of other ways to access that data.

I cannot change the structure of the source table and there are no other dates I can use.

Thanks
Bhuvnesh
Bhuvnesh
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5236 Visits: 4076
aaa121 (10/3/2013)
However this would obviously pick up the same records time and time again.
How the data will be same on every query execution ? or you mean to say "no. of records" (even this also wont be same every time..)

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
aaa121
aaa121
SSC-Addicted
SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)

Group: General Forum Members
Points: 495 Visits: 783
If a record is valid from '1900-01-01' until '2013-10-03' I will end up extracting it each day between these two dates using the method described above.
Keith Tate
Keith Tate
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1024 Visits: 979
Does the table(s) in question have a surrogate key (i.e. identity column or something)? If so use your date query but add a statement to get anything that is greater than the highest value in your history table.

If you have to worry about updates to existing data maybe use a two step approach where you update the current records based on the PK and verify that something has changed (using binary_checksum() or something like it). Then do your inserts based on new data.

Would something like that work? or did I miss something in your post?



Microsoft Certified Master - SQL Server 2008
Follow me on twitter: @keith_tate

Forum Etiquette: How to post data/code on a forum to get the best help
Bhuvnesh
Bhuvnesh
SSCertifiable
SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)SSCertifiable (5.2K reputation)

Group: General Forum Members
Points: 5236 Visits: 4076
aaa121 (10/3/2013)
But from a performance I would like to think of other ways to access that data.
OK so what are the performance impact you think would come up ?

heavy volume ?
High resource usage during peak hours ?
resource conflict ?
Blocking ?

-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
dwain.c
dwain.c
SSCertifiable
SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)SSCertifiable (7.3K reputation)

Group: General Forum Members
Points: 7261 Visits: 6431
aaa121 (10/3/2013)
I have a 3rd normal form table which maintains history over time using a date row start and date row end.

I need to extract data incrementally from this table starting at 1900-01-01 and looping through each day. The easiest way is to provide the data and use it in a where clause like this:

WHERE '1900-01-01' between date row start and date end date



incrementing the date by a day then running the query again. However this would obviously pick up the same records time and time again. This isn't neccessary a issue as I will be merging these 'incremental' records using a column . It will only be a one off data load operation. But from a performance I would like to think of other ways to access that data.

I cannot change the structure of the source table and there are no other dates I can use.

Thanks



I confess I'm not totally sure of what you're trying to do here, but more importantly I'm unclear why you think you'd need to loop to use the same records many times. It might be possible to use a Tally table for that.

Can you post up some DDL for your table, some consumable sample data (INSERTs) and expected results?

I'd bet someone here could then give you a good answer on a best practice approach to a solution.


My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
aaa121
aaa121
SSC-Addicted
SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)SSC-Addicted (495 reputation)

Group: General Forum Members
Points: 495 Visits: 783
OK so the reason behind this is a follows:

1. I need to create several type 2 dimensions from OLTP tables which have been put together as track historical data changes as this data is held in the source system. Each table has a primary key, date updated, date row start and date row end.

2. The first requirement is that I need to build up the history in these new dimensions incrementally from the beginning of time to the present day.

3. I will also have to create and load transactional fact tables and join the to these dimensions (at the right point in time) to obtain the correct surrogate key.

4. Once I reach the present day the ETL process will then run incrementally for each day thereafter.

5. The changed records are staged then verified in the merge which maintains the dim so extracting the same records into staging won't pollute the dimensions, the only drawback is the inefficiency.

So ideally I would look to write one import routine which would carry out the one off historical load and then continue daily. This brings me back to the issue I raised originally as I only wish to pick up changed records.

There is an identity on the source tables but this cannot be trusted as it may not be in the correct order. There is also a date updated field which can be used going forward but it's use was inconsistent previously.

So the options I have thought about are as follows:
1. The historical data load is a one off process so I may have to bite the bullet and stage the unchanged data time and time again and let the merge deal with the actual changes.

2. For historical proces only: Maintain a table of the identity values loaded for each table and use this in a not in statement to prevent picking up the same records again and again.

3. For ongoing load processes I can use the date updated column to establish only the true changed records.

I would be interested in to hear your opinions
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search