Determine the size of the changed data in a database???

  • We are upgrading to new servers/datacenter.

    One of the metrics the network team wants from us is to determine how big the data changes are in the databases for a 24 hour period.

    We can easily determine the number of rows that have been modified or added, and I can do some long drawn out math by multiplying the rows by the size of each field. Of course, not all fields in a row are modified or inserted, so that would be somewhat skewed. And, I'm not sure I want to work that hard! Their big concern is bandwidth usage.

    I thought of performing a differential backup, and using the size of the backup as the "amount of data changed".

    Otherwise, I'm stumped. I don't think I ever have had a request to capture this info!

    Thanks in advance!

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • I could see a differential coming out a little low if the same pages are updated multiple times in that 24 hour period but it will probably get you close.

    There is other overhead associated with a database connection besides the data though. If there is no comingled data on the network adapter that serves SQL Server data a simple perfmon counter log watching for bytes sent and received could be an option.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Maybe transaction log backup sizes would be more useful than a diff ?

  • I was thinking about this more and a differential could come out high as well as low. If you only update one INT field on a record the whole 8K page is backed up so it could skew it upward. Tran logs might have the same effect. I guess it will depend on the nature of the workload as to whether a differential or a tran log would get you close, or be skewed.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Not quite sure how to translate any of this into "bandwidth". If you have users or applications returning large data sets from selects, they are using bandwidth, but there are no database changes. Or you may have scheduled jobs that update millions of rows and that activity hits the disks, but does that count as bandwidth ? Are the SQL servers standalone boxes with internal drives, or shared storage ?

  • Very good point. I was sticking to "data changes" in the original post. It might be worth re-assessing the parameters of the original request.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Michael L John (5/30/2013)


    Their big concern is bandwidth usage.

    WHICH "bandwidth"? Network traffic, cpu usage, harddisk reads, writes, or overall throughput, memory I/O, time to make logfile backups, number of rotations per second on the hamster wheel? And how do they think that the amount of data that's changed in the server is going to have any correlation to things like network traffic? I can drive every aspect of SQL into the stops and you'll never see a lick of it on the network (for example). By the same token, I can drive network traffic through the roof and never change any data in SQL Server.

    What is it that they mean by "bandwidth"?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply