If I compress a primary key, am I compressing the whole table --> P-key with DATA_COMPRESSION

  • Can someone explain how data compression works at the index level? If I compress a primary key, am I compressing the whole table [since the data pages are leaf pages]?

  • There is row-level compression and page-level compression (stronger and slower). If you compress clustered index, in fact you compress the table. PK does not have to be the same as clustered key. You should estimate compression gain first to see what you will get in terms of space. Also, be aware that compressing active objects/indexes/partitions is not recommended unless you really have a lot of unused cpu resources. Usually, one compresses old partitions that are queried almost never.

    _____________________________________________________
    Microsoft Certified Master: SQL Server 2008
    XDetails Addin - for SQL Developers
    blog.sqlxdetails.com - Transaction log myths
  • Thanks for your observations.

  • Steve Malley (3/5/2013)


    Can someone explain how data compression works at the index level? If I compress a primary key, am I compressing the whole table [since the data pages are leaf pages]?

    Depends whether the primary key is clustered or not. There's no requirement that it is.

    If you compress the clustered index, you are compressing the table, if you compress a nonclustered index, you're compressing just the index.

    Have a look for the performance benchmarks microsoft did (cat team I think). They got performance gains in all but one case, even when compressing active, in-use portions of the database. Less data to be moved around, less memory required resulted in improved performance.

    I'll see if I can find the link in the morning.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Vedran and Gail for pointing out that a primary key is not the equivalent of a clustered index.

    Where does this confusion come from?

  • Microsoft. When you create a primary key constraint, if a clustered index does not exist on the table SQL Server defaults the primary key to a clustered index.

    Not always what one may want for the clustered index on a table.

  • Estimate space gain with sp_estimate_data_compression_savings procedure:

    exec sp_estimate_data_compression_savings 'myschema', 'mytable', 1, NULL, 'none'

    exec sp_estimate_data_compression_savings 'myschema', 'mytable', 1, NULL, 'row'

    exec sp_estimate_data_compression_savings 'myschema', 'mytable', 1, NULL, 'page'

    With compression = none you check savings because of defragmentation.

    Replay your average load and see how much CPU usage will increase.

    Each time compresses row or page is accessed (e.g. for reading) it must be decomressed using CPU.

    _____________________________________________________
    Microsoft Certified Master: SQL Server 2008
    XDetails Addin - for SQL Developers
    blog.sqlxdetails.com - Transaction log myths
  • It is also worth understanding how this table is used and if it normally resides in memory or not.

    If it is routinely scanned, then you may choose accept lower savings figures from page compression (say 50% instead of 80%)

    If it is routinely used for key lookups, you may choose to err on the side of row compression

    If it is normally in memory anyway, you may decide to leave it uncompressed.

    If the only way to fit it in memory is through compression, you may evaluate otherwise.

    Same with volatility - if 10% of the table is written every day, compression is less attractive compared to it being read-only.

    Test with representative load and then you'll know what works best for you;-)

    Cheers,

    JohnA

    MCM: SQL2008

  • SQLCharger (3/6/2013)


    If it is normally in memory anyway, you may decide to leave it uncompressed.

    Why?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I should have mentioned in my initial post that I was speaking of a Primary Key *Clustered* constraint. This creates a corresponding index.

  • Steve Malley (3/6/2013)


    I should have mentioned in my initial post that I was speaking of a Primary Key *Clustered* constraint. This creates a corresponding index.

    And that index is the data at the leaf level, it isn't a separate index like nonclustered indexes.

  • GilaMonster (3/6/2013)


    SQLCharger (3/6/2013)


    If it is normally in memory anyway, you may decide to leave it uncompressed.

    Why?

    Gail,

    My tests have shown a performance degradation when the same queries run on compressed tables compared to uncompressed.

    I know, there are fewer pages to trawl across, but it is more expensive to crack them open as well - I've found the net effect to be negative (10-20% give or take).

    Especially with page compression, even if you have 1/3 of the pages (typical for many data tables) I've found it to still be quicker when having compression off. Looks like the access speed of each page goes down by a bit more than the size reduction.

    If there are exceptions to this I'd love to hear them, so I can fine-tune the algorithms.:cool:

    Cheers,

    JohnA

    MCM: SQL2008

  • SQLCharger (3/6/2013)


    My tests have shown a performance degradation when the same queries run on compressed tables compared to uncompressed.

    I know, there are fewer pages to trawl across, but it is more expensive to crack them open as well - I've found the net effect to be 10 - 20% worse.

    Especially with page compression, even if you have 1/3 of the pages (typical for many data tables) I've found it to still be quicker when having compression off. Looks like the access speed of each page goes down by a bit more than the size reduction.

    If there are exceptions to this I'd love to hear them, so I can fine-tune the algorithms.:cool:

    That depends entirely on the workload and on the type of compression. Also you have the effect of reducing memory footprint thereby reducing IO load (more can fit into memory), reducing the CPU impact of moving data around, etc.

    Page compression's overhead can be unacceptable, row compression often isn't, when there's an overhead at all.

    Take a look at the whitepaper on compression, the performance results are at the bottom.

    http://msdn.microsoft.com/en-us/library/dd894051.aspx

    I've seen other benchmarks where even OLTP apps got performance improvements from compressing at row or page level.

    Pretty much it's a case of 'test it and see', unless you're running a vendor app where that testing's been done.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail beat me to that one. Oh well.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply