Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Most efficient way to extract data for a historical data table Expand / Collapse
Author
Message
Posted Thursday, October 3, 2013 2:23 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, April 29, 2014 6:47 AM
Points: 319, Visits: 783
I have a 3rd normal form table which maintains history over time using a date row start and date row end.

I need to extract data incrementally from this table starting at 1900-01-01 and looping through each day. The easiest way is to provide the data and use it in a where clause like this:

WHERE '1900-01-01' between date row start and date end date

incrementing the date by a day then running the query again. However this would obviously pick up the same records time and time again. This isn't neccessary a issue as I will be merging these 'incremental' records using a column . It will only be a one off data load operation. But from a performance I would like to think of other ways to access that data.

I cannot change the structure of the source table and there are no other dates I can use.

Thanks

Post #1501050
Posted Thursday, October 3, 2013 3:47 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 12:26 AM
Points: 2,840, Visits: 3,963
aaa121 (10/3/2013)
However this would obviously pick up the same records time and time again.
How the data will be same on every query execution ? or you mean to say "no. of records" (even this also wont be same every time..)


-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done
Post #1501090
Posted Thursday, October 3, 2013 4:06 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, April 29, 2014 6:47 AM
Points: 319, Visits: 783
If a record is valid from '1900-01-01' until '2013-10-03' I will end up extracting it each day between these two dates using the method described above.
Post #1501099
Posted Thursday, October 3, 2013 8:47 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Yesterday @ 4:52 PM
Points: 590, Visits: 911
Does the table(s) in question have a surrogate key (i.e. identity column or something)? If so use your date query but add a statement to get anything that is greater than the highest value in your history table.

If you have to worry about updates to existing data maybe use a two step approach where you update the current records based on the PK and verify that something has changed (using binary_checksum() or something like it). Then do your inserts based on new data.

Would something like that work? or did I miss something in your post?




Microsoft Certified Master - SQL Server 2008
Follow me on twitter: @keith_tate

Forum Etiquette: How to post data/code on a forum to get the best help
Post #1501228
Posted Thursday, October 3, 2013 9:02 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Tuesday, August 19, 2014 12:26 AM
Points: 2,840, Visits: 3,963
aaa121 (10/3/2013)
But from a performance I would like to think of other ways to access that data.
OK so what are the performance impact you think would come up ?

heavy volume ?
High resource usage during peak hours ?
resource conflict ?
Blocking ?


-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done
Post #1501233
Posted Thursday, October 3, 2013 6:17 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 10:55 PM
Points: 3,648, Visits: 5,321
aaa121 (10/3/2013)
I have a 3rd normal form table which maintains history over time using a date row start and date row end.

I need to extract data incrementally from this table starting at 1900-01-01 and looping through each day. The easiest way is to provide the data and use it in a where clause like this:

WHERE '1900-01-01' between date row start and date end date

incrementing the date by a day then running the query again. However this would obviously pick up the same records time and time again. This isn't neccessary a issue as I will be merging these 'incremental' records using a column . It will only be a one off data load operation. But from a performance I would like to think of other ways to access that data.

I cannot change the structure of the source table and there are no other dates I can use.

Thanks



I confess I'm not totally sure of what you're trying to do here, but more importantly I'm unclear why you think you'd need to loop to use the same records many times. It might be possible to use a Tally table for that.

Can you post up some DDL for your table, some consumable sample data (INSERTs) and expected results?

I'd bet someone here could then give you a good answer on a best practice approach to a solution.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1501391
Posted Monday, October 21, 2013 1:27 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, April 29, 2014 6:47 AM
Points: 319, Visits: 783
OK so the reason behind this is a follows:

1. I need to create several type 2 dimensions from OLTP tables which have been put together as track historical data changes as this data is held in the source system. Each table has a primary key, date updated, date row start and date row end.

2. The first requirement is that I need to build up the history in these new dimensions incrementally from the beginning of time to the present day.

3. I will also have to create and load transactional fact tables and join the to these dimensions (at the right point in time) to obtain the correct surrogate key.

4. Once I reach the present day the ETL process will then run incrementally for each day thereafter.

5. The changed records are staged then verified in the merge which maintains the dim so extracting the same records into staging won't pollute the dimensions, the only drawback is the inefficiency.

So ideally I would look to write one import routine which would carry out the one off historical load and then continue daily. This brings me back to the issue I raised originally as I only wish to pick up changed records.

There is an identity on the source tables but this cannot be trusted as it may not be in the correct order. There is also a date updated field which can be used going forward but it's use was inconsistent previously.

So the options I have thought about are as follows:
1. The historical data load is a one off process so I may have to bite the bullet and stage the unchanged data time and time again and let the merge deal with the actual changes.

2. For historical proces only: Maintain a table of the identity values loaded for each table and use this in a not in statement to prevent picking up the same records again and again.

3. For ongoing load processes I can use the date updated column to establish only the true changed records.

I would be interested in to hear your opinions


Post #1506830
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse