Approach to normalising 15 million transactions

  • We have a requirement to normalise 15 million historical transactions into a number of tables. This involves various transformation processes currently implemented in T-SQL stored procedures. Current thinking is to process the transactions in T-SQL, driven by a cursor (probably FAST_FORWARD, LOCAL ) , invoking the stored procedures as required. Would welcome thoughts on the viability of such an approach and any suggestions of alternatives

  • A transaction table is a history table which is also an audit table of sorts. Whatever happened, happened. In most cases, such tables should NOT be normalized because the contents of the supporting tables could change.

    If you wanted to partition the table to take advantage of greatly reduced index, stats, and backup maintenance, I'd go for that but, again, transaction tables are history tables that shouldn't be normalized.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you're migrating your data into a new configuration, I wouldn't do it in a single set as you're describing. Instead, break it apart and move only 100k rows at a time. It'll make for smaller, more manageable transactions. It'll also make it easier to restart the process from a known point if something goes wrong.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (11/21/2016)


    If you're migrating your data into a new configuration, I wouldn't do it in a single set as you're describing. Instead, break it apart and move only 100k rows at a time. It'll make for smaller, more manageable transactions. It'll also make it easier to restart the process from a known point if something goes wrong.

    Just to be sure, the OP said they wanted to "normalize" the transactions. If that's the actual case, I still strongly recommend against it because transaction tables are supposed to be hardcoded data captured. They're history tables that shouldn't be "normalized". To be equally sure, I'm taking that to mean they want to break the rows up into separate tables with FKs and all the normal 3NF stuff which, again, should not be done to history tables.

    If they mean "migrating" data as is to some other database, then I mostly agree with chunking the data. I say "mostly agree" because the proper use of "minimal logging to a table with a clustered index" might be the better/faster/easier way to go without blowing things up. It also means not having to use double the disk size to build a clustered index after the fact. IIRC, you can still "chunk" the data and have it still be minimally logged, especially if you take Trace Flag 610 into consideration.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Why can't you insert-select into the normalized tables, in order of dependency, to get the data moved over. 15m rows is not a very large amount so I assume the storage would not be a big issue. That should be simple to implement and reasonably quick to perform.

    This is hard to answer because we don't know why you need to do this and we can't think of a good reason on our own. I will say that using cursors is most certainly not a good idea. I assume the SProcs work on a single record at a time as well.

    1. What is the problem driving the requirement? Performance, storage, regulatory?

    2. Are you still receiving new transactions or is this intended to archive the data?

    Wes
    (A solid design is always preferable to a creative workaround)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply