I left my laptop at the Red Gate booth, and then managed to wander up to a Data Compression talk from Sunil A. from Microsoft. It was a great presentation, and despite some extensive reading and researching earlier this year, I learned quite a bit about how it works. There are definitely changes from earlier documentation and Sunil had some great slides, so I'll try to write more over time.
A few quick things:
- Row compression is moving to a variable storage format. This is a low level of compression and it provides some benefits, with low overhead. Nulls and 0s use almost no space, with them being encoded and using less than a byte if there's present.
- Indexes non-leaf pages are only row compressed. You can set leaf pages to be row or page compressed.
- Page compression has two parts: prefix and dictionary.
- Prefix compression actually doesn't use the shortest matching value, but the longest. In the example Sunil gave, if you have three values (Smith, Smith, Smithson), the value stored in the header is (Smithson) and the tokens actually determine how many characters to replace. So the tokens here would be (5, 5, 8) meaning use the first 5 characters (Smith) for the first two values.
- Prefix compression is only for a column. The values and tokens are not shared across columns.
- Dictionary compression matches entire values, so only those entire matching values on the page are replaced. Using an example of three more values (Steve, Steve, Steven), only the 2 Steves would be replaced with a dictionary entry.
- The compression techniques are byte agnostic, so 0x65656565 could be an integer value or character (5 As) and the compression doesn't care. It looks at byte patterns, not values.
- Compression space savings make sense if the table or index is a relatively large percentage of the database. If you have a small part of the database, or not used that much, the savings might not really pay off.
- You compress object by object, so you make the decisions for each table, AND EACH index. It's not all indexes that you consider, but each one separately.
More to come later.
I also was on another panel today, talking about the new features of SQL Server. The configuration server, the new central server that can store common registrations for all DBAs also can help deploy PBM. I didn't know that.
The resource governer is cool as well and I saw a few interesting tidbits on that. Like the CPU changes are dynamic, meaning on the fly, and that you might want to keep a small Admin pool setup that gets at least 5% and no more than 20%.
I'm tired. It's been a long week.