Time to rethink DB design and workflow

  • We're tackling a project that will see us handle a lot more data a lot more frequently. So figured it was time to reassess our current design and workflow to see if there's a better way. In broad terms we're currently doing the following:

    -- Downloading and transforming pricing data via 6 ftp sites and a few API's

    -- Each distinct source is inserted into it's own table as the columns aren't consistent across sources

    -- We use separate views for each source table to produce a consistent column layout for each source.

    -- We use a master view which simply selects all records from the 9 separate views, this is about 150,000 records.

    -- This is done in a dev server, then each day we delete all records from the master table on the live server and upload the new data from the master view.

    This works reasonably well, but we're only updating the data every week or two and we want to move to daily updates of all the pricing data. Are there obvious areas in the above we should look to improve?

    A problem we're already running into is archiving pricing data. We'd like to move the pricing from the source table into an archive table as frequently as possible, in theory we only want the most recent batch in the source table. Not sure of the best way to do this, currently we rename the table to include a timestamp which might include the past month's worth of data. But we'd like to get all the archived data into one table for each data source as we'll be able to spot cheaper prices quickly.

    We only use rec_id's as the primary key for each table, don't use relationships and not the best normalization. What's the best way to re-work the model? I'm happy to engage someone on a contract basis if that made sense, just didn't want the hit-and-miss process at elance. Thanks in advance.

  • I no longer have the code that I wrote to Archive Data but basically it consisted of a Stored Procedure with a WHile Loop

    This SP refereed a Control table that contained the batch size for the number of records that you wanted to commit.

    I also specified the maximin amount of time to run.

    It included a column a enabled, so if any given time you could set the flag and the SP would stop running once it committed the current batch.

    I also had a Date Column that was used to prevent archiving data that was too recent. I also created a trigger that would prevent someone from entering in an archive Date if less than 2 years from the current Date.

    Inside a transaction I would archive the records, then delete the records from the Table.

    I also had a log table to track when the program started, how many records it deleted per bach. Start & End Datetime for each batch and a record that records the total number of records deleted, etc.

    In this case I created a separate Database to Store the Archive SP

    I would highly recommend that you normalize your table, and Foreign Key Constraint.

    You should create an index on the column in the Child table for performance reasons.

    Do not create indexes on columns that will not be used.

    Make sure that you have your indexes that you need and give thought to the column that you use as the Clustered Index.

    If you have a natural Primary key then you want want to consider using that as opposed to a surrogate key (Identity Column).

    How often does the Data get loaded?

    Are transactions being performed on the Data throught the Day?

    Is you Database more of read intensive DB or Write or both?

    I hope this helps.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • What version of SQL and what edition?

    If you have 2005/2008 Enterprise Edition, consider table partitioning for your large table. Then you can switch partitions out to do archiving. Works quite well.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    Thank you for mentioning that.

    I had forgotten all about the switch archive capability withy Partitioned Tables.

    Thanks for correcting me.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply