Find Average Row Size and Num of Rows per page for a table

  • Querying sys.dm_db_partition_stats for a table and getting the following items back, assuming I want to calculate the average row size and num of rows per page for a table, for "reserved" page count:

    Table Name: test1

    Index_ID: 1

    Partition_Number: 1

    Reserved_Page_Count: 395

    Row_Count: 50142

     

    To get the Reserved space in KB, I take 395 * 8 to get 3160KB which correlates to the Reserved space when running sp_spaceused 'dbo.test1'.

    Next is the part I am struggling with.  What is the formula for calculating Average Row Size and Number of Rows per page?

    For Average Row Size would it be Reserved_Page_Count * 8 / Row_Count?  I have been at this couple of days now, googling different things and I found the following formula: 1024 * Reserved_Page_Count * 8 / Row_Count.  My question  here is, why the 1024?

    For Number of Rows per page, would it be 8192 / (1024 * Reserved_Page_Count * 8) / Row_Count ?  The problem here is the result is 0 when doing this in SQL.  Using a calculator I get 5.048952115233244.

    I then started playing with converting to FLOAT and DECIMAL, which resulted in more confusion as the results were different.  Googling why the difference in those data types, I read different articles on why one is better than the other and Im still a little confused about it.

    Im looking for what is the correct formula for each, why times by the 1024 in finding the Average Row Size, and answers to the FLOAT vs Decimal.

     

    Bonus question if anyone cares to answer....for finding the Average Row Size and Number of Rows per page does it make more sense to use the Reserved space/pagecount or the data or the used?

     

    Thank you all so much!

     

  • > 1024 * Reserved_Page_Count * 8 / Row_Count.  My question  here is, why the 1024? <<

    So that the final calc is in bytes, not kb.  That makes sense, since hopefully most rows will be < 1000 bytes.

    Decimal type -- with at least 6 decimal places -- should give you a more accurate result than a float because float is only approximate, and the more calcs you do on the value, the more the approx values can cause the final result to be a bit off.

    > for finding the Average Row Size and Number of Rows per page does it make more sense to use the Reserved space/pagecount or the data or the used? <<

    Hmm, partly depends on why you are doing the calc.  If to reduce total disk space usage, then use reserved space.  If to improve the actual disk usage and/or performance of the table, then use used space.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • So, I don't know the exact formula, but I can answer some of the questions.

    Multiplying by 1024 would be to change it from KB down to B. A single row has a max size of 8 KB, so seeing it represented in B rather than KB makes more sense as MOST rows are going to be under 1 KB.

    As for FLOAT vs DECIMAL, FLOAT is an approximate value whereas DECIMAL is precise.  Decimal also allows you to define the precision on the value.

    Now as for your calculation, and my understanding of the value "reserved_page_count", I think the calculation is wrong.  The reserved page count is the number of pages that are empty and reserved for that table.  What you want for calculating the average number of rows pre page would be the row_count/used_page_count.  That will tell you the average number of rows per page.  Doing it on the reserved page count doesn't really tell you anything useful (from my understanding... please correct me if I am wrong).

    Now as for why you are getting a different value than expected when you use a calculator, that is probably due to your CAST/CONVERT.  You need to cast/convert ALL INT values prior to using them.  So both Reserved_Page_Count, Row_Count, 8, 1024, AND 8192 ALL need to be converted to DECIMAL before using them.  You can verify this logic with a simple select:

    SELECT 1/3, CAST(1 AS DECIMAL(19,10))/CAST(3 AS DECIMAL(19,10))

    From the above, you can see that 1 and 3 are integers, so 1/3 gives you 0.33333333.... which when converted to an INT gives you 0.

    To answer the "bonus question" - to get average row size, you would need to look at the number of pages used (used page count), multiply that by 8 (to get the number of KB used) and divide that number by the number of rows.  This will give you the average row size.  To get the average number of rows per page, you take the row count and divide it by the number of pages used.

    For example, if you have a table with 146 rows that is using up 6 pages, we know the average number of rows per page would be 146/6, or 24.3333...  Now for the space used per row, we need the space (6*8*1024) and the number of rows (146) or 336.657534 Bytes per row.  Now, these values are not 100% accurate.  A row cannot be a fraction of a byte , so 337 is a better approximation for space used per row.

    Basically, you want to know the average X per Y.  Row Size is similar to saying "Pages per Row" as we know that a page is a specific "size" (8 KB... all pages are 8 KB).  So if we know we have 100 pages and 200 rows, we know it is an average of 4 KB per row.  We know this because we take pages (100) divide by rows (200) which gives us 0.5.  Now we convert that over to KB (0.5*8) which gives us 4 KB per row.  These would be HUGE rows though.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • The problem with using "Reserved_Page_Count" is that it actually has little to do with row size.  For example, if you have an index that has suffered a large number of page splits, the "Reserved_Page_Count" can actually be double (or more) that of the space the data actually takes up.

    To the best of my knowledge, the only way to get an actual depiction of row size is to query sys.dm_db_index_physical_stats in either the "DETAILED" (very accurate but slow because it checks every page) or "SAMPLED" (less accurate because it only samples 1 out of 100 pages for indexes with more than 10,000 pages) mode.

    As for the number of rows per page, the formula is actually FLOOR(8092/avg_record_size_in_bytes).  And, yes, the correct number is, in fact, "8092" because the row sizes created by sys.dm_db_index_physical_stats include the row header and slot array space used by each row.

    Of course, that's only for "in-row" stuff.  If you have LOBs or Overflow allocation units, "Record_Count" and avg_record_size_in_bytes aren't what the names appear to be and the "in-row" stuff can actually be thrown off by a whole lot when such things actually DO fit "in-row".

    For example, here's the PK on the "Users" table of one of the versions of the Stack-Overflow database.  You'll notice that a whole lot of pages have low, very low, and super low page densities because of "Trapped Short Rows" where many individual low-row-count pages that have no in-row presence of LOB data are trapped between pages that have an in-row presence of LOBs.

    So even using sys.dm_db_index_physical_stats is going to play hell with any calculations you might do for average row size and number of rows per page because most folks simply don't force LOBS (or OVERFLOW) to be out of row even if the combined calculations are "right" especially since a great deal of the pages in the chart above are less than 90% full.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Of course the lazy way to get an accurate avg row size, if you have the time to let it run, is using:

    EXEC sys.dm_db_index_physical_stats

    specifying 'DETAILED'.  SQL will then calc the avg row size for you :-).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • All great answers and all of them very helpful!

    I will be using the used_page_count for my calculations.

    Again, I appreciate everyone's help!

  • GBeezy wrote:

    All great answers and all of them very helpful!

    I will be using the used_page_count for my calculations.

    Again, I appreciate everyone's help!

    That still won't be accurate for the reasons I previously stated.  Depending on the condition of the indexes, it could still be wildly incorrect.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Looking at https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-partition-stats-transact-sql?view=sql-server-ver15

    It states that the used_page_count accounts or in row, LOB and overflow.  And, looking at sp_spaceused, from what I can tell and from my simple example sys.dm_db_partition_stats.used_page_count * 8 is equivalent to sp_spaceused.data + sp_spaceused.index_size.

    Or, maybe I am missing your whole point.  Or, maybe you're saying what I am after really cant be accompished?

    What I am attempting to calculate is how "wide" my table is.  I have "wide" in quotes as this seems very ambiguous.

    • This reply was modified 3 years, 3 months ago by  GBeezy.
  • GBeezy wrote:

    Jeff,

    Looking at https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-db-partition-stats-transact-sql?view=sql-server-ver15

    It states that the used_page_count accounts or in row, LOB and overflow.  And, looking at sp_spaceused, from what I can tell and from my simple example sys.dm_db_partition_stats.used_page_count * 8 is equivalent to sp_spaceused.data + sp_spaceused.index_size.

    Or, maybe I am missing your whole point.  Or, maybe you're saying what I am after really cant be accompished?

    What I am attempting to calculate is how "wide" my table is.  I have "wide" in quotes as this seems very ambiguous.

    It's kind of like asking how long a piece of string is.  Remember that you're talking about a used "PAGE COUNT".  A page containing even a single byte on a single row is considered used even though it's more than 99% empty.  Keeping things in whole numbers just for the sake of talking, if you have 100 pages in such a condition (and it's neither impossible nor infrequent that similar things happen), you would have a total of 100 used pages * 8Kbytes or 800KBytes or 819,200 byte.  Divide that by the row count of 100 and you come up with a calculated row length of 819,200/100 or 819 bytes when the actual row length is 1 + 7 + 2 (data size, row header size, slot array location for the row on the page) or 10 bytes.  Imagine specing out your system requirements based on such a common problem.

    And, although not usually nearly as bad as my extreme "1-byte" example, basing the spec on the fragmented contents of pages used v.s. row counts are going to cause you to seriously over estimate your needs.

    I'll be back... there's a formula I've developed for my index work I've been doing and I have to go find the formula.  The formula (actually, "formulas", because it's also dependent on the max number of rows that will actually fit on a page) will provide a (usually) much closer estimate.

    Still... remember that it's an ESTIMATE based on averages and averages can be pretty misleading.

    Of course, if you don't intend to actually and properly maintain your indexes (and current "Best Practice" index maintenance isn't the "proper" way to do index maintenance), then used pages / row count might suffice.  As with all else in SQL Server, "It Depends".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's long past time that SQL Server should provide a ROWLENGTH function that gives you the physical length of a row(s).  But that's likely just wishful thinking on my part.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher, I agree with you.  I think what makes it difficult is that rows can have different lengths.  A VARCHAR(8000) row for example, could be anywhere between 1 byte and 8000 bytes in length.  Would you want ROWLENGTH to give you the min (1 byte), the max (8000 bytes), look at each row and pick the average, look at each row and pick the max/min?

    I know I am just being nit-picky here, but those variable length columns make a ROWLENGTH function tricky.  YOU might want it to look at the max length used whereas I may want the absolute max, and someone else may want to know what the minimum length a row can be so might want the min value (1 byte).

    Now, I don't disagree that such a function would be useful, I just think that there are some use cases that may confuse or misrepresent your data.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • I intended that you would get a specific row's(s') actual physical length(s) based on the input to the function.

    You specify a unique key or key range for any existing unique index.  SQL then supplies the avg len of each row in the range (by default),  or by specifying 'AVG' for mode, or you can explicitly specify mode as 'EACH_ROW' to see the length of each row individually.

    I'd say they could limit the number of key levels you can specify to 5.

    Maybe it would have to be a proc and not a function.  That would be fine too.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply