Data Archiving

  • Hi Steve Jones and all other SQL gurus,

    We are providing educational training to employees of all ages via intranet application. We have currently one web server (windows server 2003) with one SQL Server 2005 running. we are running our application on classic asp. The database has already coming data from difference sources. there are more complexicities into system. We found that the bottlneck is SQL Server all the time. We recently optimized our application and database, but still it is not satisfactory to client, As users are increasing day by day. and the data is becoming bigger on daily bases.

    We are moving our application on .NET (ASP.NET) with more features. We have planned to use a hardware when user hit the website, it will be redirecting user request amoung three webservers (like NLB) Network Load Balancing mechanizm. Each web server will have its own sqlserver. All the three SQL Servers will be sharing data among them via Peer to Peer replication (Like microsoft architecture). I have to further make our OLTP (online operational data) short by keeping 1 month into operational database and rest will be moved to another sql server on daily bases - Like Archiving. we will have reporting options on our website for two different type of users, employees and administration. Each Employees would be able to see his/her all reports by quering in operational and archived data both. Adminitration will be able to see all employees' data from begining.

    I would need help with following:

    1- First let me know if this architecture has any drawback let me know. if you think it can be further improved - please share.

    2- If there is any other better solution that we can adopt right now, it is good time to think about it immediately, otherwise we will be late.

    3- When archiving data how do i make queries? to run query from both OLTP and archived database? How? On reporting side Users has option to input date criteria for last nth days to fetch records, Like last 10 days or last 100 days etc.

    4- Archiving technique is a good solution or not. I have implemented an SP to move data from OLTP to archiving database but it is taking too much time. I was looking at http://msdn.microsoft.com/en-us/library/ms190923(SQL.90).aspx and it says in very begining that

    "To bulk-transfer data from one Microsoft SQL Server database to another, data from the source database must first be bulk-exported into a file. The file is then bulk-imported into the destination database"

    I request for wise and good direction to success If some one guide me on my solution/proposal, I will be thankful.

    Please response as soon as you can. Steve and all others .....

    Shamshad Ali.

  • I have one small suggestion. I dont know if it will meet your criteria or not.

    Set up a read only Transactional replication to your report server. In that transactional replication, set it up not to propagate the delete commands (I read it here in SSC that you can ignore delete commands that is done in publisher to hit the subscriber). Then delete the records from your production server instead of moving to Archive DB since the report server can act as the Archive DB.

    Please forgive my ignorance if I am totally wrong.... 🙂

    -Roy

  • Thanks for your suggestion, I never worked on replication which do NOT apply deletions to subscribers and Need to test this as a solution- if it works and has not drawback or any issue. If it works it will really save alot of headache when moving data etc.

    I would also like to put some light on my other question to which I asked earlier.

    Shamshad Ali.

  • Mind you I don't deal much with the data side of SQL Server. This is just my thought from studying I have done on the subject for tests and just pure interest in it.

    With that said, for the archiving piece I would think if you implement table partitioning to separate your data by month might improve I/O performance for the production database. Placing your current "production" data in a file group on a segmented disk by itself then previous month's data in a separate file group. That way the procedures performed for the archive data would not be fighting for I/O processing.

    Shawn Melton
    Twitter: @wsmelton
    Blog: wsmelton.github.com
    Github: wsmelton

  • Yes, you are right and i would appreciate to have comments from everyone and get idea which one will best suite in my scenario mentioned above.

    We actually want to keep our Operational/online DB light as much as possible. Also the online database will be part of Peer to Peer Replication with other SQL Servers. I read some articles on replication with partition with some restrictions so i did not spend much time on that. If i get ideas from more guys who recommend partitioning would be best and cover my scenarios mentioned above including P2P replication support with improved high availablitity / high performance - I will definitely research and start development on it.

    Thanks.

    Shamshad Ali.

  • When you add an article using sp_addarticle, there is a parameter called @del_cmd. if you give the value for this as 'NONE' then no action is taken. You can see more details about it in BOL under sp_addarticle.

    -Roy

  • Shamshad Ali (5/27/2009)


    We found that the bottlneck is SQL Server all the time.

    Nope... it's not SQL Server. It's the code that's running against SQL Server. You say you've optimized the app. Does that mean that you've converted all of the app embedded SQL to stored procs? Probably not. Have you analyzed the slower queries to see which indexes are being used or not? Probably not. Are you sure of what the slower queries actually are and where they're coming from? Probably not. Do you have any Cursors, While Loops, Recursive CTE's, UDF's that reference tables (except maybe for a Tally table), formulas around columns in a FROM clause, non-equality correlated sub-queries, joined UPDATEs where the target table isn't in the FROM clause, views-of-views, more than 2 levels of stored procedures, UDF's of UDF's, or a pot wad of places that have SELECT's with variables on the left side of the equals sign in the Select List, tables with a lot of repeating data or NULLs, queries/objects that return more data than they need to, or multi-purpose highly "flexible" procs that will take any criteria or sort order that you can possible throw at it (just to name a few)? I'm betting you do.

    SQL Server frequently gets the blame because people stop looking when they find "slowness in the server". It's probably not the server... it's the RBAR and other bad practices that have been built into the code. Your performance problems are in the code.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • One thought on How to query the archived data.

    You can archive the data onto a different server and refer to them in your PROD/LIVE with SYNONYMs.this way, you will not need to modify your app codes.

    For example, you want to achive the data of 01/2008 and assume all the tables are of the form tab1_200801, tab2_2008, ..etc..the steps will be:

    1. Migrate all the data into archive DB with arch_tab1_200801, arch_tab2_200801, ..etc.;

    2. Create synonyms with same name tab1_200801, tab2_2008, ..etc.. in the PROD and pointing them to arch_ tables.

  • I don't like this idea because if something happens to the replication and you have to reinitialize the subscription you would lose your archive.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply