Difference in row-size between what was expected and what was — can anyone explain please?

Question

Difference in row-size between what was expected and what was — can anyone explain please?

sean redmond

SSCertifiable

Points: 5819
More actions
August 8, 2017 at 9:31 am

#332181

Hi everybody,
I am creating a table that will be the basis of a fact table for a cube. I have 3.1 billion rows ready for it and, if all goes well, it will take 1.6 million entries each day.
As a consequence, I want to make the table as compact as possible. It has 4 columns — the first column is a bigint and the primary key, the second and third column are ints and the fourth column has the datatype date. All columns are not nullable. The table is very simple. It has a clustered index from the primary key and no non-clustered indexes. There are no foreign keys, constraints (except NOT NULL) etc.
I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the «avg_page_space_used_in_percent» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor only affects non-clustered indexes, doesn't it? and there are none).
I am only looking at the leaf-level pages of the clustered index.
I have recreated my table in a script below and it shows where I am getting my numbers from. Am I missing something obvious?
I would be happy if someone could answer these questions, point out the error of my ways or, at least, point me in the right direction.
Many thanks,
Sean Redmond.
-- If I have a table with 4 columns — bigint, int, int, date — why is the average row size much greater than 19 bytes?
-- 0. create necessary objects
use master
go
create database gosh
    on        ( name = 'gosh', filename = '«your path here»\gosh.mdf', size = 32MB, filegrowth = 8MB )
    log on    ( name = 'gosh_log', filename = '«your path here»\gosh_log.ldf', size = 8MB, filegrowth = 8MB );
go
use gosh
go
create table dbo.RowSizes( RowSizesID bigint not null primary key, anInt int not null, anotherInt int not null, aDate date not null );
go
-- 2. populate the table with a million entries
insert into dbo.RowSizes( RowSizesID, anInt, anotherInt, aDate )
select        top 1000000
            row_number() over(order by a.id)
            , (abs(a.id) - row_number() over(order by a.id))
            , (abs(a.id) + row_number() over(order by a.id))
            , dateadd( minute, row_number() over(order by a.id)*9, '20000101' )
from        master.dbo.syscolumns a
cross join    master.sys.columns b
;
go
--3. how much spaced is used. The query is from Itzik Ben-Gan.
declare    @db varchar(10) = 'gosh';
declare @object varchar(50) = 'dbo.RowSizes';
select    index_type_desc
        , index_depth
        , index_level
        , page_count
        , record_count
        , avg_page_space_used_in_percent
        , avg_fragmentation_in_percent
from    sys.dm_db_index_physical_stats( db_id(@db), object_id(@object, 'u'), null, null, 'detailed' );
exec    dbo.sp_spaceused @objname = @object, @updateusage = true;
go
/*
    index_type_desc    index_depth    index_level    page_count    record_count    avg_page_space_used_in_percent
    CLUSTERED INDEX        3            0            3461    1000000                99.92
    CLUSTERED INDEX        3            1             16     3461                45.40
    CLUSTERED INDEX        3            2             1         16                03.33
    name            rows    reserved    data        index_size    unused
    dbo.RowSizes    1000000    27976 KB    27688 KB    144 KB        144 KB
*/
select '1. Theoretical No. of rows per page' Ting    , cast(8096/19 as int) Answer union        -- 426. 8096: available space in bytes on a page; 19 = 8+4+4+3 (Theoretical row-size)
select '2. Actual Avg. No. of rows per page'        , cast(1000000/3461 as int) union        -- 288
select '3. Theoretical No. of rows per page' Ting    , 19 union                                -- 8+4+4+3
select '4. Actual No. of Bytes per Row'                , (cast(27688 as int)*1024)/1000000        -- 28
-- housekeeping
use master
go
drop database gosh;
go

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

ScottPletcher SSC Guru Points: 100950 More actions · Answer 1

Sean Redmond - Tuesday, August 8, 2017 9:31 AM

I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the Â«avg_page_space_used_in_percentÂ» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).

The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

ben.brugman SSChampion Points: 13369 More actions · Answer 2

ScottPletcher - Tuesday, August 8, 2017 9:38 AM

Sean Redmond - Tuesday, August 8, 2017 9:31 AM
I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the Â«avg_page_space_used_in_percentÂ» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).

The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??

Maybe helpfull, mayby not.

; ;WITH aa as (select rowsizesid ,convert (binary (4), reverse (substring (t.%%physloc%%, 1, 4))) page ,convert (binary (2), reverse (substring (t.%%physloc%%, 5, 2))) file_id ,convert (binary (2), reverse (substring (t.%%physloc%%, 7, 2))) slot from rowsizes t ), HH as (select page, MAX(slot) max_slot, CONVERT(int,max(slot)) aantal from aa group by page) ,HH2 as (select aantal, COUNT(*) tel from HH group by aantal) -- select top 500 * from AA -- select top 500 * from HH select * from HH2 order by tel

Selection on AA gives for each row the location
Selection on HH gives for each page the number of rows.
selection on HH2 gives de statistics for the number of rows in pages.

I did run your data and all pages but one contained 191 rows, this is a fair indication that this is the maximum number of rows which can be fitted.
Because it is a clustered table, there can be no forwarded rows. (They are always heap I think).
Page split as an action does not influence the number of pages.
A 'split' caused by a forwarded row does not apply here.

Obsolete here, but for future reverence, here is some code which shows the forwarded_row count and some sizes

select forwarded_record_count, page_count, avg_page_space_used_in_percent, record_count, min_record_size_in_bytes, max_record_size_in_bytes, avg_record_size_in_bytes, DB_NAME(database_id) as databasename, OBJECT_ID, object_name(OBJECT_ID, database_id) as table_name, index_type_desc From sys.dm_db_index_physical_stats (DB_ID(N'testdb'), Object_id(N'dbo.rowsizes'), null, null, 'DETAILED')

And does anybody have code to actually read or dump a page of data?
print convert(int, 0x000166FA)

DBCC TRACEON (3604); dbcc page (testdb, 1, 91898,1 ) -- The page number should determined for your system.

This produces a dump of the data, third parameter can be 0/1/2/3 see documentation to see which to use. (1 gives a per row dump).

Hope this helps.
Ben

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

2 bytes for the row header, 2 bytes for the slot array index, 2 bytes for the null bitmap offset, then one bit per column (minimum 1 byte) for the null bitmap itself.

That is indeed the minimum number of bytes that goes into the row structure itself.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ben.brugman SSChampion Points: 13369 More actions · Answer 4

GilaMonster - Thursday, August 10, 2017 6:13 AM
2 bytes for the row header, 2 bytes for the slot array index, 2 bytes for the null bitmap offset, then one bit per column (minimum 1 byte) for the null bitmap itself.
That is indeed the minimum number of bytes that goes into the row structure itself.

The 7 (2 + 2+ 2 +1) bytes plus :
Slot 1 Column 1 Offset 0x4 Length 8 Length (physical) 8
RowSizesID = 2
Slot 1 Column 2 Offset 0xc Length 4 Length (physical) 4
anInt = 1073624920
Slot 1 Column 3 Offset 0x10 Length 4 Length (physical) 4
anotherInt = 1073624924
Slot 1 Column 4 Offset 0x14 Length 3 Length (physical) 3

Makes 26 bytes.
26*191 = 4966
This still leaves around 3 K for header and other information.
(And the slot table is already included. So a large chunck is still not explained).

(40 bytes for each row were reported. 191 * 40 = 7640. But 40 is significantly more than explained above).

Ben

ScottPletcher SSC Guru Points: 100950 More actions · Answer 5

ben.brugman - Thursday, August 10, 2017 6:03 AM

ScottPletcher - Tuesday, August 8, 2017 9:38 AM
Sean Redmond - Tuesday, August 8, 2017 9:31 AM
I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the Â«avg_page_space_used_in_percentÂ» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).
The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??

Page split as an action does not influence the number of pages.
Ben

Hmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Does the database have read_committed_snapshot or allow_snapshot_isolation enabled on it?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

ScottPletcher - Thursday, August 10, 2017 7:45 AM

ben.brugman - Thursday, August 10, 2017 6:03 AM
ScottPletcher - Tuesday, August 8, 2017 9:38 AM
Sean Redmond - Tuesday, August 8, 2017 9:31 AM
I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the Â«avg_page_space_used_in_percentÂ» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).
The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??
Page split as an action does not influence the number of pages.
Ben

Hmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?

If there had been page splits, the avg percent space used would not be in the high 90% range, as there would be pages with half the page unused.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sean redmond SSCertifiable Points: 5819 More actions · Answer 8

Hi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.

ScottPletcher SSC Guru Points: 100950 More actions · Answer 9

GilaMonster - Thursday, August 10, 2017 7:51 AM

ScottPletcher - Thursday, August 10, 2017 7:45 AM
ben.brugman - Thursday, August 10, 2017 6:03 AM
ScottPletcher - Tuesday, August 8, 2017 9:38 AM
Sean Redmond - Tuesday, August 8, 2017 9:31 AM
I expected that each row would in or around 19 bytes long. Instead, each row is 26 bytes long. What is in this extra 7 bytes and can I recover it in some way?
Furthermore, I expected to get 426 rows (8096/19) per full page. Instead, I'm getting 289 and the Â«avg_page_space_used_in_percentÂ» is 99.92%. Even 311 (8096/26) rows would be better.
I had thought that it might have something to do the fill factor, but it is 0 (and fill factor is only affects non-clustered indexes, doesn't it? and there are none).
The extra bytes are row overhead, for SQL to manage the rows, and, no, you can't recover any of that space. That's actually rather low, so I suspect 7 bytes is the min overhead.
As to fullness, you should explicitly indicate freespace, but yes, it must have been 0 because 99.92% is extremely full already. SQL doesn't split rows across pages (too I/O inefficient), of course, so only full rows can fit. Still, 289 does seem low. Perhaps not loaded in order / page splits??
Page split as an action does not influence the number of pages.
Ben
Hmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?

If there had been page splits, the avg percent space used would not be in the high 90% range, as there would be pages with half the page unused.

I said "perhaps" page splits could account for some of the extra pages. Either way, we still have the mystery of the alleged 99.9+% fill factor but so few rows per page. Something is still missing to fully explain what's going on with this table. That would be many more rows than is typically deleted from a table.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 10

Sean Redmond - Thursday, August 10, 2017 8:00 AM
Hi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.

To be clear, I was asking about the database options, not the session's isolation level. Is that what you checked?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

sean redmond SSCertifiable Points: 5819 More actions · Answer 11

GilaMonster - Thursday, August 10, 2017 8:09 AM
Sean Redmond - Thursday, August 10, 2017 8:00 AM
Hi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.
To be clear, I was asking about the database options, not the session's isolation level. Is that what you checked?

For the former two, I right-clicked the DB in question, went to Options Â» Miscellaneous, and saw that the values for 'read_committed_snapshot' and 'allow_snapshot_isolation' are set to false.
For the isolation level, I took the value from dbcc useroptions.

ben.brugman SSChampion Points: 13369 More actions · Answer 12

ScottPletcher - Thursday, August 10, 2017 7:45 AM
ben.brugman - Thursday, August 10, 2017 6:03 AM
Page split as an action does not influence the number of pages.
Ben
Hmm, I would think it would have too, especially when it first occurs. By definition, a page split is one page becoming two, right?

Sorry, my bad.
My formulation was wrong. Offcourse a page split (as an action) does increase the number of pages. In the given example it doesn't influence the fill off the pages. So the page split does not account for the limited number of rows that can be stored in a page. I did not express myself correctly, sorry.

Ben

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 13

Sean Redmond - Thursday, August 10, 2017 8:20 AM
GilaMonster - Thursday, August 10, 2017 8:09 AM
Sean Redmond - Thursday, August 10, 2017 8:00 AM
Hi Gail,
In answer to your question, neither 'read_committed_snapshot' nor 'allow_snapshot_isolation' are enabled. It is at 'Read Committed'.
I haven't gotten around to looking at Ben's queries yet.
Thanks very much for the explanation about the 7 bytes overhead per row. It is good to know.
Sean.
To be clear, I was asking about the database options, not the session's isolation level. Is that what you checked?
For the former two, I right-clicked the DB in question, went to Options Â» Miscellaneous, and saw that the values for 'read_committed_snapshot' and 'allow_snapshot_isolation' are set to false.
For the isolation level, I took the value from dbcc useroptions.

Ok, so it's not the row version pointers. Very curious.

What did DBCC Page dump out?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ben.brugman SSChampion Points: 13369 More actions · Answer 14

GilaMonster - Thursday, August 10, 2017 11:23 AM
Ok, so it's not the row version pointers. Very curious.
What did DBCC Page dump out?

The dump out (on my system) shows rows each containing 40 bytes.
I did an update to make identification a bit more easy.
Update rowsizes set anotherInt = 313249263 -- (= efcdab12) Update rowsizes set anInt = 286331153 -- (= 11111111) Update rowsizes set aDate = '1900-01-01' Update rowsizes set aDate = '1900-01-02' where rowsi
So this makes anotherInt and anInt identifiable.
The date is less clear.
And there is probably a 'change/version' number. (number of bytes ???)

It is clear that there are 40 bytes in each row, the offset tabele contains 2 (?) bytes for each row.
What the 40 bytes consist off is not totally clear.

For a part of the dump see below.
All code to create the dump is present in this thread. (So for a complete dump use the code supplied).

Ben

The dump: (selected parts)
--
-------------------------------------------------------------------------------------------

PAGE: (1:91898)

BUFFER:

BUF @0x0000000095FBE240

bpage = 0x0000000095506000    bhash = 0x0000000000000000    bpageno = (1:91898)
bdbid = 8          breferences = 0        bcputicks = 0
bsampleCount = 0       bUse1 = 44104        bstat = 0xc0010b
blog = 0x79797979       bnext = 0x0000000000000000

PAGE HEADER:

Page @0x0000000095506000

m_pageId = (1:91898)      m_headerVersion = 1      m_type = 1
m_typeFlagBits = 0x4      m_level = 0         m_flagBits = 0x2000
m_objId (AllocUnitId.idObj) = 6545 m_indexId (AllocUnitId.idInd) = 256
Metadata: AllocUnitId = 72057594466861056
Metadata: PartitionId = 72057594374389760           Metadata: IndexId = 1
Metadata: ObjectId = 1017874793  m_prevPage = (0:0)       m_nextPage = (1:91903)
pminlen = 23         m_slotCnt = 192        m_freeCnt = 32
m_freeData = 7776       m_reservedCnt = 0       m_lsn = (6675:15802:2)
m_xactReserved = 0       m_xdesId = (0:0)       m_ghostRecCnt = 0
m_tornBits = -1285646452

Allocation Status

GAM (1:2) = ALLOCATED      SGAM (1:3) = NOT ALLOCATED
PFS (1:88968) = 0x60 MIXED_EXT ALLOCATED 0_PCT_FULL       DIFF (1:6) = CHANGED
ML (1:7) = NOT MIN_LOGGED

Slot 0 Offset 0x60 Length 40

Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VERSIONING_INFO
Record Size = 40
Memory Dump @0x000000001124C060

0000000000000000: 50001700 01000000 00000000 11111111 â€ P...............
0000000000000010: efcdab12 5b950a04 00009048 01000100 â€ Ã¯ÃÂ«.[Â•.....H....
0000000000000020: 00008b01 00000000 â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ ..Â‹.....

Version Information =
Transaction Timestamp: 395
Version Pointer: (file 1 page 84112 currentSlotId 0)

Slot 0 Column 1 Offset 0x4 Length 8 Length (physical) 8

RowSizesID = 1 (=01000000)

Slot 0 Column 2 Offset 0xc Length 4 Length (physical) 4

anInt = 286331153 (=11111111)

Slot 0 Column 3 Offset 0x10 Length 4 Length (physical) 4

anotherInt = 313249263 (=efcdab12)

Slot 0 Column 4 Offset 0x14 Length 3 Length (physical) 3

aDate = 1900-01-01 (=8b01 + another byte)

Slot 0 Offset 0x0 Length 0 Length (physical) 0

KeyHashValue = (1b7fe5b8af93)
Slot 1 Offset 0x88 Length 40

Record Type = PRIMARY_RECORD Record Attributes = NULL_BITMAP VERSIONING_INFO
Record Size = 40
Memory Dump @0x000000001124C088

0000000000000000: 50001700 02000000 00000000 11111111 â€ P...............
0000000000000010: efcdab12 5c950a04 0000e07e 01000100 â€ Ã¯ÃÂ«.\Â•....Ã ~....
0000000000000020: 40008c01 00000000 â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ â€ @.ÂŒ.....

Version Information =
Transaction Timestamp: 396
Version Pointer: (file 1 page 98016 currentSlotId 64)

Slot 1 Column 1 Offset 0x4 Length 8 Length (physical) 8

RowSizesID = 2
-- etc etc.
--------------------------------------------------------------------------
OFFSET TABLE:

Row - Offset
191 (0xbf) - 7736 (0x1e38)
190 (0xbe) - 7696 (0x1e10)
189 (0xbd) - 7656 (0x1de8)
188 (0xbc) - 7616 (0x1dc0)
187 (0xbb) .......
......
...... 416 (0x1a0)
7 (0x7) - 376 (0x178)
6 (0x6) - 336 (0x150)
5 (0x5) - 296 (0x128)
4 (0x4) - 256 (0x100)
3 (0x3) - 216 (0xd8)
2 (0x2) - 176 (0xb0)
1 (0x1) - 136 (0x88)
0 (0x0) - 96 (0x60)