Dynamic Partitions for Flight Data

  • I am parsing 30+ meg binary files, as varbinary(max), containing parameter data from an aircraft. I have a hard 5-minute limit to parse, store and analyze the data for presentation. The actual parsing takes about 30-35 seconds to complete, giving me about 3.5 million parameter rows per flight. Storing the data with an INSERT TO SELECT takes about 60-75 seconds, while a SELECT INTO only requires about 6-8 seconds. I'm looking to acquire as much time as possible for the analysis process, by saving time on the data storage process.

    My thought is to SELECT INTO a non-partitioned table in the partition filegroup and then SWITCH it into a dynamically created partition in a partitioned table with the partition index of the date/time associated with the flight. After 30 days, I also intend to remove the data and append it to an archive table.

    I would like to ask for thoughts on this approach before embarking down this road.

    Thank you.

  • If I had to solve that problem, I would use partitioning much like you propose. Your approach uses two of the benefits available with partitioning - fast loading and fast archiving.

    'Fast-loading':

    Loading your data first into a stand-alone table before switching it into the main table allows normal ETL loading tricks:

    - build table as a heap table with no indexes

    - bulk-load all data with an exclusive lock on the heap table (WITH (TABLOCKX))

    - add clustered index then non-clustered indexes to match the target table (create these indexes WITH ONLINE = OFF)

    - data clean-up, if necessary

    - perform all analysis that can be done using only the just-loaded data

    - switch into target table

    - perform remaining analysis on the whole table

    'Fast archiving' comes from switching old partitions into a separate archive table, and later from an archive table partition into a stand-alone table that gets dropped.

    Note that each partition can be in a separate filegroup. This allows time-range filegroups that assist with management. For example, because you are using date-based partition boundaries, you can have filegroups for different years. New data goes into the current-year partition. Older partitions full of only archive data can be moved to cheaper storage over time, and the filegroups can be eventually dropped, once the archive data in the filegroup is switched out and dropped.

    To keep the SPLIT call that creates the target partition fast, ensure you have an empty partition at the 'end' of the table, and only SPLIT empty partitions.

     

    Eddie Wuerch
    MCM: SQL

  • You'll actually get better overall performance if you define the clustered index on the empty table and then populate it with proper minimal logging techniques.  You'll save a wad of disk space, as well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply