how best to save large amounts of data

  • Forgive me if this isn't the right area to post, but I wasn't sure where to post this question.

    I do consulting for a former employer and we are having issues with a SQL Server 2000 database I designed. It stores emails that come and go so that the company has a record for legal purposes and so that people can search their own emails. I had taken steps to reduce spam but as you can imagine the table has gotten quite large. Now I am getting timeouts when inserting new records. Adjusting the timeout is only going to postpone the problem.

    What are my options? Will partitioning help? Any guidance will be greatly appreciated.

    Thanks

  • How is the table designed? Indexes? Triggers? That information will help us to help you

    Kindest Regards,

    @puy Inc

  • It has an identity column for a primary key. I didn't put any other indexes on it and there are no triggers.

    Dave

  • I had a somewhat similar problem where someone wanted me to create a proc that inserted a record into a "logging" database each time a web page page was hit.  This process was intended to be used as a session and hit tracking database so that we could track which of our customers were using which sites and how frequently.  However, I quickly found out that all of the inserts were absolutely pegging my server out in both CPU usage and disk I/O.  The solution that we devised was to use the MS message queue to generate flat files that were imported on a schedule for reporting.  I am not sure if your method of capturing the emails would allow for this though.

    I suppose another option would be to create an archive process that offloads emails before a certain date away from the table that actually captures the emails.  Then run that process monthly, weekly, etc.  That way, users can still query their older emails, however the process that inserts new emails can work with a table that has a reduced number of rows.

    -Greg

  • Partitioning should help, though be sure you separate out the IO or move to a new server to help with the load. If you can handle some of the stuff being offline, I really like Greg's suggestion above. Maybe you could even dump this stuff to a server and index the flat files.

  • I would strongly suggest to archive the old data(may be you can write a small job which regularly archives the data to table/flat file).






    Regards,
    Sudheer 

    My Blog

  • Thanks for all for the replies.  If I archive this data, do I put it into another table?  This data needs to be searchable.  Maybe archive into different tables and do a view with a join so I can do a select from a single source?

    Thanks,

    Dave

  • Dave,

    Yes you can place the data in another table. You can check this url for more details .

     






    Regards,
    Sudheer 

    My Blog

  • Thanks for the help! I think this will take care of things for the data.

    Dave

  • just curious, how many rows in this big table?

    ---------------------------------------
    elsasoft.org

  • Sorry it's taken so long to reply. It's been busy lately.

    In one of the tables I have 158,732 rows and in another I have 1,069,501.

    Dave

  • That's really not that many rows.  I don't think you need to partition with so few.  as for why the inserts are so slow, are you sure there are no other indexes on the tables besides the PK on the identity column?

    ---------------------------------------
    elsasoft.org

  • Your best option is to try to get partitioning going. I always hear "you should archive". Well, that's pretty much what partition does except you can always search it!

    Basically, read up on vertical partitioning. I'm going to implement this very soon for one of our databases which has about 7 million "events" (which are kinda like emails). Basically, I will create a process which creates a new table each Month (ex. Events_20070101). This new table will store only data for that month. The main table will be purged so that it's OLTP transactions are as fast as possible. So, starting out the new month, Inserts will be as fast as possible. Indexing can be more aggresive on the older tables because there will be no inserts / updates on them. Now, I will only do this "monthly" archiving for 1 year, at which time all the monthly tables will be merged into 1 yearly table.

    I will have 1 view which will be used to pull all the data from all the event tables. This way there is no need to determine ahead of time which month / year the data I'm looking for falls in, SQL will do that for me.

    Just read up on Partitioning and Indexing views and there is plenty of information on this.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply