Time for a ROWID?

  • I've always had a problem with heap tables: The basic rule is don't do it. So you have to find something to use as a proper key. If the key is nondeterministic (like an address), then table inserts happen all over the place and pages get torn up. If you create your own primary key, then the table organization never relates to how the data is being used (similar records are not grouped). It's a problem as old as computing. The usual solution is a mix of brute force, blind ignorance, cunning and horsepower. Or programming as it's sometimes known.

  • Forgive a relative newbie to SQL internals but why can't the page size be variable per database or even per table? This would give the best of all worlds as a small table or one where selects usually only pull one record at a time could have a very small page size, and larger tables or ones where multiple rows are often pulled at the same time could have a larger page size.

  • Steve is this what you are thinking of? (From Paul Randal's blog)

    http://www.sqlskills.com/blogs/paul/post/SQL-Server-2008-New-(undocumented)-physical-row-locator-function.aspx

    If everything seems to be going well, you have obviously overlooked something.

    Ron

    Please help us, help you -before posting a question please read[/url]
    Before posting a performance problem please read[/url]

  • Eric M Russell (7/27/2011)


    mar10br0 (7/27/2011)


    paul.knibbs (7/27/2011)


    I'm not sure increasing the row length in SQL would help much--after all, the largest data types are stored in off-row data pages anyway, and a properly normalised database surely shouldn't need so many of the smaller data types that you'd need to increase the row length?

    I don't think Steve was thinking of allowing for larger records, but rather lowering overhead-costs associated with managing fewer pages in the page-pool if the I/O costs for loading a larger page is negligible on modern hardware.

    Microsoft would have to do some bench-marking to see if the vast majority of applications running on typical hardware would not be negatively impacted by a larger page-size if top-end hardware may benefit. Maybe even allow the sysadmin to choose the optimum page-size for his hardware when installing the server (if Microsoft did a good job with their source-code, they may be able to compile the server-kernel for different page-sizes).

    It seems to me that there are performance advantages to having a smaller more granular page size, at least when querying non-clustered rows. If just one row is read from a page, then the entire page is copied to the buffer. So, if SQL Server's page size was increased from 8K to something like 64K, then the same query would require more storage in the buffer pool and thus more total memory. Of course you could always install more memory, but a larger page size would in many cases use the same memory less efficiently than would a smaller page size.

    Does that sound right?

    Indeed. I'm in the same camp. I would rather have smaller page sizes. It's rare in an OLTP system to actually need 8K pages. Typical transactions are very small by nature so even 2K pages would suffice (we used 2K pages back in the old Sybase days). I would rather have the option of choosing the page size, say from 2K up to 32K for DW environments, instead of the current one-size-fits-all model of SQL Server.

    Also, I don't think 'modern' technology has changed much for mechanical drives. Your typical enterprise/SAN hard drive hasn't changed much in the past ~10yrs. The seek times, IOPs, and rotation speeds (10K or 15K) are the same today as they were 10 yrs ago. The only thing that has changed is drive density, but drive performance has remained mostly flat if not slightly worse because of the multiple platters and data density within one drive spindle. Now if we're talking SSD's then that's an entirely different ball game. If you run a SAN filled with SSD's then not sure if any of this matters...

  • There is ROWID in oracle but its absolutely useless. At the first reorganization or defragmentation etc. the rowids will change. If it cant change then the database can't defragment, as defragmentation by definition means changing the location of rows to different pages/files.

  • bitbucket-25253 (7/27/2011)


    Steve is this what you are thinking of? (From Paul Randal's blog)

    http://www.sqlskills.com/blogs/paul/post/SQL-Server-2008-New-(undocumented)-physical-row-locator-function.aspx

    I can see how the physical row id would be useful for debugging or performance tuning purposes, like perhaps identifying which table or table partition a specific row in a resultset was contained in. However, it can't be stored anywhere else and used as a real key value, because it's an actual physical locator. Only a logical identifier, like an identity or timestamp, could move with the row as the table is reorganized.

    Can anyone suggest a practical use for it in an application or analytical query; some scenario where it would be more applicable than an identity, rank(), or row_number()?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • krowley (7/27/2011)


    Forgive a relative newbie to SQL internals but why can't the page size be variable per database or even per table? This would give the best of all worlds as a small table or one where selects usually only pull one record at a time could have a very small page size, and larger tables or ones where multiple rows are often pulled at the same time could have a larger page size.

    Because the entire of the SQL storage engine, file layout, etc is written around a fixed 8k page. Sure, that could be changed in a future version, but they'd have to rewrite the storage engine from scratch (file layout, buffer management, backups, index structure, IO operations, etc, etc)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (7/27/2011)


    krowley (7/27/2011)


    Forgive a relative newbie to SQL internals but why can't the page size be variable per database or even per table? This would give the best of all worlds as a small table or one where selects usually only pull one record at a time could have a very small page size, and larger tables or ones where multiple rows are often pulled at the same time could have a larger page size.

    Because the entire of the SQL storage engine, file layout, etc is written around a fixed 8k page. Sure, that could be changed in a future version, but they'd have to rewrite the storage engine from scratch (file layout, buffer management, backups, index structure, IO operations, etc, etc)

    In addition to Microsoft retrofitting the storage engine itself, MSDN and the entire SQL Server community would have to retrofit all of our DMV and system table queries where we calculate storage totals based on an assumed 8K page size. 🙁

    SELECT

    SUM (user_object_reserved_page_count)*8 as usr_obj_kb,

    SUM (internal_object_reserved_page_count)*8 as internal_obj_kb,

    SUM (version_store_reserved_page_count)*8 as version_store_kb,

    SUM (unallocated_extent_page_count)*8 as freespace_kb,

    SUM (mixed_extent_page_count)*8 as mixedextent_kb

    FROM sys.dm_db_file_space_usage

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (7/27/2011)


    In addition to Microsoft retrofitting the storage engine itself, MSDN and the entire SQL Server community would have to retrofit all of our DMV and system table queries where we calculate storage totals based on an assumed 8K page size. 🙁

    I wonder how committed Microsoft is to the 8K page size. A quick look in BOL shows that it is "officially documented", although DMV documentation tends to shy away from saying the number out loud.

    Then again, maybe 8K is a "sweet spot" that is unlikely to move after all. By and large, even though data volumes are growing alongside processing power and storage, the growth is in number of rows, and not number/size of columns - (desired) rows per page aren't increasing. There never was an "8K barrier" that needed to be broken through in the past, and I don't imagine we're feeling anything like it now.

  • The Dixie Flatline (7/27/2011)


    Steve, maybe it's because I haven't had enough caffeine this morning, but I have a question: How does the ROWID column that you propose differ from an identity column?

    Identity isn't necessarily unique, it's a property that can be changed. I'd envision something a little stronger that identifies the row.

  • Eric M Russell (7/27/2011)


    It seems to me that there are performance advantages to having a smaller more granular page size, at least when querying non-clustered rows. If just one row is read from a page, then the entire page is copied to the buffer. So, if SQL Server's page size was increased from 8K to something like 64K, then the same query would require more storage in the buffer pool and thus more total memory. Of course you could always install more memory, but a larger page size would in many cases use the same memory less efficiently than would a smaller page size.

    Does that sound right?

    Perhaps, perhaps not. You cannot query "one row" unless it's over 4k in size. You'll always pull in a page, and potentially more than one page.

    Also, the advantages are that most of the time when you want to query a small set of data, you want as few reads as possible. More data on a page, can potentially mean fewer reads, if you have the hardware that supports larger pages.

  • krowley (7/27/2011)


    Forgive a relative newbie to SQL internals but why can't the page size be variable per database or even per table? This would give the best of all worlds as a small table or one where selects usually only pull one record at a time could have a very small page size, and larger tables or ones where multiple rows are often pulled at the same time could have a larger page size.

    Usually for efficiency reasons. You want to read in and pull a standard size without forcing calculations to find the end of the page, especially when it's on disk.

  • bitbucket-25253 (7/27/2011)


    Steve is this what you are thinking of? (From Paul Randal's blog)

    http://www.sqlskills.com/blogs/paul/post/SQL-Server-2008-New-(undocumented)-physical-row-locator-function.aspx

    Perhaps, if it's in all structures.

  • Ewald Cress (7/27/2011)


    Eric M Russell (7/27/2011)


    In addition to Microsoft retrofitting the storage engine itself, MSDN and the entire SQL Server community would have to retrofit all of our DMV and system table queries where we calculate storage totals based on an assumed 8K page size. 🙁

    I wonder how committed Microsoft is to the 8K page size. A quick look in BOL shows that it is "officially documented", although DMV documentation tends to shy away from saying the number out loud.

    Then again, maybe 8K is a "sweet spot" that is unlikely to move after all. By and large, even though data volumes are growing alongside processing power and storage, the growth is in number of rows, and not number/size of columns - (desired) rows per page aren't increasing. There never was an "8K barrier" that needed to be broken through in the past, and I don't imagine we're feeling anything like it now.

    It's changed in the past. Used to be 2k.

    It's not necessarily about data size, it's more that this could match up more efficiently with retrievals. Lots of disk reads are 64k already.

  • Eric M Russell (7/27/2011)


    I can see how the physical row id would be useful for debugging or performance tuning purposes, like perhaps identifying which table or table partition a specific row in a resultset was contained in. However, it can't be stored anywhere else and used as a real key value, because it's an actual physical locator. Only a logical identifier, like an identity or timestamp, could move with the row as the table is reorganized.

    Can anyone suggest a practical use for it in an application or analytical query; some scenario where it would be more applicable than an identity, rank(), or row_number()?

    Replication is the big one I can see it being important for.

    I'm not sure that an application would need it, though I would expect it could be exposed through some function.

Viewing 15 posts - 16 through 30 (of 67 total)

You must be logged in to reply to this topic. Login to reply