Most efficient way to extract data for a historical data table

  • I have a 3rd normal form table which maintains history over time using a date row start and date row end.

    I need to extract data incrementally from this table starting at 1900-01-01 and looping through each day. The easiest way is to provide the data and use it in a where clause like this:

    WHERE '1900-01-01' between date row start and date end date

    incrementing the date by a day then running the query again. However this would obviously pick up the same records time and time again. This isn't neccessary a issue as I will be merging these 'incremental' records using a column . It will only be a one off data load operation. But from a performance I would like to think of other ways to access that data.

    I cannot change the structure of the source table and there are no other dates I can use.

    Thanks

  • aaa121 (10/3/2013)


    However this would obviously pick up the same records time and time again.

    How the data will be same on every query execution ? or you mean to say "no. of records" (even this also wont be same every time..)

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • If a record is valid from '1900-01-01' until '2013-10-03' I will end up extracting it each day between these two dates using the method described above.

  • Does the table(s) in question have a surrogate key (i.e. identity column or something)? If so use your date query but add a statement to get anything that is greater than the highest value in your history table.

    If you have to worry about updates to existing data maybe use a two step approach where you update the current records based on the PK and verify that something has changed (using binary_checksum() or something like it). Then do your inserts based on new data.

    Would something like that work? or did I miss something in your post?



    Microsoft Certified Master - SQL Server 2008
    Follow me on twitter: @keith_tate

    Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • aaa121 (10/3/2013)


    But from a performance I would like to think of other ways to access that data.

    OK so what are the performance impact you think would come up ?

    heavy volume ?

    High resource usage during peak hours ?

    resource conflict ?

    Blocking ?

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • aaa121 (10/3/2013)


    I have a 3rd normal form table which maintains history over time using a date row start and date row end.

    I need to extract data incrementally from this table starting at 1900-01-01 and looping through each day. The easiest way is to provide the data and use it in a where clause like this:

    WHERE '1900-01-01' between date row start and date end date

    incrementing the date by a day then running the query again. However this would obviously pick up the same records time and time again. This isn't neccessary a issue as I will be merging these 'incremental' records using a column . It will only be a one off data load operation. But from a performance I would like to think of other ways to access that data.

    I cannot change the structure of the source table and there are no other dates I can use.

    Thanks

    I confess I'm not totally sure of what you're trying to do here, but more importantly I'm unclear why you think you'd need to loop to use the same records many times. It might be possible to use a Tally table for that.

    Can you post up some DDL for your table, some consumable sample data (INSERTs) and expected results?

    I'd bet someone here could then give you a good answer on a best practice approach to a solution.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • OK so the reason behind this is a follows:

    1. I need to create several type 2 dimensions from OLTP tables which have been put together as track historical data changes as this data is held in the source system. Each table has a primary key, date updated, date row start and date row end.

    2. The first requirement is that I need to build up the history in these new dimensions incrementally from the beginning of time to the present day.

    3. I will also have to create and load transactional fact tables and join the to these dimensions (at the right point in time) to obtain the correct surrogate key.

    4. Once I reach the present day the ETL process will then run incrementally for each day thereafter.

    5. The changed records are staged then verified in the merge which maintains the dim so extracting the same records into staging won't pollute the dimensions, the only drawback is the inefficiency.

    So ideally I would look to write one import routine which would carry out the one off historical load and then continue daily. This brings me back to the issue I raised originally as I only wish to pick up changed records.

    There is an identity on the source tables but this cannot be trusted as it may not be in the correct order. There is also a date updated field which can be used going forward but it's use was inconsistent previously.

    So the options I have thought about are as follows:

    1. The historical data load is a one off process so I may have to bite the bullet and stage the unchanged data time and time again and let the merge deal with the actual changes.

    2. For historical proces only: Maintain a table of the identity values loaded for each table and use this in a not in statement to prevent picking up the same records again and again.

    3. For ongoing load processes I can use the date updated column to establish only the true changed records.

    I would be interested in to hear your opinions

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply