How Important archiving old data

  • Hi All,

    How important is it archiving old data ? In recent days, We saw some timeouts and blockings happening for app API calls to the database.  Later, found that they are using timeout parameter in their connection strings,

    The database size is 5TB.This is kind of warehouse but used as OLTP mixed workload.

    Again, when trying to understand why queries are taking time, we observed the tables involved in API calls queries have grown from 10GB to 70GB in past 4 months. Then the application started blaming the dba team it's the database issue as the application has no code changes from last 8-9 months. During our initial analysis, found some of the queries, doesn't have proper filters, a lot of implicit conversions, lookups and only JOINS are involved of 5-6 tables which causing row explosions. from db size, we ensured fragmentation is taken care and stats are up to date.

    Questions:

    1. Where do you start your conversations with the app team in such situations? What specific questions to be asked?

    2. When we suggested app team to archive some of the old data, they're hestitant & afraid that some downstream app team's who consumes data from this database, might ask for data requests for old data so they aren't ready to archive. Side effect is, quiries are running slow due to high reads.

    3. In an ideal database world for an OLTP applications, how many years of data will be retained? 1 year,2 years or 3years? or is it totally dependent on application and business users? how important is it for DBA's to get this documented or getting business approval for archiving process?

    4. In this scenario, what are the options available for a dba to improve performance? Any advise on balancing data retention needs with perf optimization?

    Regards,

    Sam

  • You mention "app team" a few times. Does this mean only developers, or does app team also include product owners? The question of data retention is a business concern in terms of data they believe they require & costs to retain that data. Business owners need to be part of the discussion.

    1. Where do you start your conversations with the app team in such situations? What specific questions to be asked?

    The questions that should be asked of business owners are things like:

    • How far in the past do you need access to specific data to meet legal requirements or business requirements?  Probably not all data is needed for the same period of time. Some old data may be needed only sumarized.
    • How much are you willing to pay to keep that data? If cost to meet data retention requirements is significantly higher than that, does that change you answer to the first question? (It's amazing how quickly  some "needs" become "nice to haves" when people are confronted with cost)

    2. When we suggested app team to archive some of the old data, they're hestitant & afraid that some downstream app team's who consumes data from this database, might ask for data requests for old data so they aren't ready to archive. Side effect is, quiries are running slow due to high reads.

    Given adequate hardware (fast enough storage & network, & CPU power, queries can be fast, even on very large tables. Bad query performance as data grows is more likely an indication of inefficient queries, improper indexing, and/or slow IO.

    3. In an ideal database world for an OLTP applications, how many years of data will be retained? 1 year,2 years or 3years? or is it totally dependent on application and business users? how important is it for DBA's to get this documented or getting business approval for archiving process?

    This tends to be specific to the industry, the type of data, and even the specific company. There are legal requirements for some types of data.

    4. In this scenario, what are the options available for a dba to improve performance? Any advise on balancing data retention needs with perf optimization?

    • Tune queries
    • Analyze & improve indexes
    • Test disk & network IO to find & fix bottlenecks.

     

  • vsamantha35 wrote:

    Hi All,

    How important is it archiving old data ? In recent days, We saw some timeouts and blockings happening for app API calls to the database.  Later, found that they are using timeout parameter in their connection strings, The database size is 5TB.This is kind of warehouse but used as OLTP mixed workload.

    Again, when trying to understand why queries are taking time, we observed the tables involved in API calls queries have grown from 10GB to 70GB in past 4 months. Then the application started blaming the dba team it's the database issue as the application has no code changes from last 8-9 months. During our initial analysis, found some of the queries, doesn't have proper filters, a lot of implicit conversions, lookups and only JOINS are involved of 5-6 tables which causing row explosions. from db size, we ensured fragmentation is taken care and stats are up to date.

    Regards, Sam

    you have already identified the issue - pick up some of those queries, see how they can be improved, and go back to app team with a list of potential changes to improve the queries - and advise them that they have to do them in order to improve performance or that as alternative they need to stop bothering you and complaining about bad performance.

    things like "select distinct..." are often a sign of bad queries (bad joins and/or bad data) and the bigger the volumes the worst they will perform - so issue on this case as an example is to fix the query, not archiving or better hardware, or different indexes.

  • Ditto what Frederico said.  You've already identified the problem queries.  Check the execution plans for them and fix them.

    I'll also say that if you have to remove data from a table to make a query faster, then you need to work on the query and the indexes.  It's also possible that the design of the table is bad and you may need to change it and the app working against it.

    Normally, though,  it's just bad code and improper indexing.  This is also a reason why I say that stored procedures are much better than ORM generated code or discreet code coming from the front end.  A stored procedure can easily be modified without changing the parameters.  The other stuff requires at least a partial repair and deployment of the app.

    It also sounds like the DBA Team and the Dev Team are getting into a bit of a pissing contest.  As they say down South, all-ya-all need to get your noses out of the air and start working together because you're working on the same team, which is known as "The Company".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply