Improving Database Design and Performance Using Compression

Question

Improving Database Design and Performance Using Compression

Suresh Kumar Maganti

Default port

Points: 1429
More actions
November 17, 2010 at 10:05 pm

#388253

Comments posted to this topic are about the item Improving Database Design and Performance Using Compression
Kindest Regards,
M Suresh Kumar

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Johnc SSC Enthusiast Points: 159 More actions · Answer 1

Hi Suresh

Very interesting article, thank you. I wanted to clarify one point though. You talk about query performance improvement which, of course, is the holy grail with database applications, especially when performance starts slowing down with increased data volumes etc. However, you are also looking at table space optimisation and it appears to me that your article demonstrates the space saving more than performance optimisation.

I realise that you say query performance is improved due to faster joins with higher row density, but you don't show any evidence of that and your simple row count query took longer due to compression. So for optimum query performance would you recommend trying to achieve higher density without compression (by optimising datatypes etc.)?

Best wishes

John

deroby SSC-Addicted Points: 416 More actions · Answer 2

Although I agree that saving space is "a good thing" (tm), I do wonder if the full impact of this operation is considered.

As far as changing an nvarchar to a varchar where functionally possible, hurray!

As far as PAGE compression goes... hmm

=> it's nice to see I/O comes down due to the more compact storage as IO often is a bottleneck on servers. (CPU's are cheap, memory and fast disks are 'relatively spoken' not) but then again, the actual execution times go up !?

=> what happens to INSERT/UPDATE/DELETE operations ? Won't they be affected more when using PAGE compression ?

=> what happens to queries that try to fetch data from NON-indexed columns. I know doing so is "bad", but it happens all the time, either because there is some WHERE part that affects an additional (non-index-included) field, or because the WHERE messes up the search-ability by putting some function around the fields.

=> adding more (compressed) indexes to cover all fields doesn't really sound like an option either as it will make the database bigger again and probably causes lots of side-effects when making changes to the data

I guess a lot of the actual gain depends on the way the data is being used;

* functionally (eg. only access via some procedures with "restricted" (and hence known) execution plans or more of the ad-hoc type of operations),

* technically (SELECT by day, update by night or a mixed bag 24/7 ?)

* and practically (3 vs 300 concurrent users)

Simply having a slimmer database does not necessary result in a fitter database IMHO, but I agree it might... I'm merely trying to be cautious here.

barnold-629337 SSC Enthusiast Points: 172 More actions · Answer 3

My interest was picked-up when I read your article about size optimization, and I very like your approach.

As a first time poster on this site, I am enthusiast as for sharing my case. This reply may be seen as a long one, if so go to "In short..."

In one contract few years ago, I had a desire to have a quick look of what table space usage could be optimized, but in a really more basic way than your approach.

I was curious to see how much wasted space was present with "over sized" string type fields for some big tables, but wanted to develop a simple query for that (yeah, one could say I was a little "naive" or "over optimistic" for my experience level at that time...).

Anyway, in the context that I was working from another one design, I wanted to have a quick look. I wished to design only one query if possible, even if it was not Christmas yet... (a dream cannot hurt, right ?)

As a discussion example for clarification here, one string field in a table could have been originally created with a 120 characters length in mind at design time, when in reality some long term usage shows the maximum length really "consumed" for all rows in this field is only 45 characters. Here, "long term" could be few years or few million rows, as you wish for your particular case.

This kind of information would then be for me a starting point to discuss or evaluate if it would be beneficial to reduce the length of this "longer-than-necessary" string field as a table size optimization point of view. In this example, one could decide to reduce the length field from 120 to 60 characters, for example.

In short, I was curious to find out the maximum field length usage for every string type fields at least in one table, and I began to search if someone else had created some tools or an article for that particular task. In the time I allowed myself, I find none. (yeah, one could say I could have invest more time, but I am probably a little "naive" after all...)

I was motivated to develop something basic but general purpose, small but giving efficient results. I did not want to have "Temp" tables or lots of steps, no really, maybe only one or a few very short queries was also in my design objectives (speaking "naive"...).

My first tries to develop something were not immediately successful, however with that pending desire in mind, I then fell on an article written by Grep Larsen showing me how I could have all the required information for one table, and my motivation was raised again.

If it does not hurt anyone here, I just want to say that this one particular inspiring article appeared in http://www.sqlserverexamples.com. Do not worry, I am still a very enthusiastic amateur of "SQL Server Central"... 🙂

Using this inspiration, I finally went with this approach at that time:

-----------------------------------------------------------------

-- Display the Maximum Usage Length for Each Column in a Table (a simple tool helping to optimize ANY existing table definition)

-- WRITTEN BY: BRUNO ARNOLD

-- DATE: 04/06/2008

--

-- Originally inspired by reading GREG LARSEN on this:

-- http://www.sqlserverexamples.com/v2/Examples/DynamicTSQL/DisplayingSmallestRowinTable/tabid/207/Default.aspx

--

-- I mainly use this to quickly see (on big tables) if a character field length

-- is really using (or needing) that range of field length, thus helping to spot possible space optimization.

--

-- FOR EXAMPLE:

--In this example using the [SalesOrderDetail] table from [AdventureWorks], I "could securely" change

--the [CarrierTrackingNumber] field's design definition to nvarchar(12) instead of nvarchar(25),

--but ONLY if the future usage is not meant to be change, of course.

--In this case, the select shows 24 bytes for this field, but it is a nvarchar type,

--so it means you could use 12 characters in the "nvarchar" design definition.

--

-----------------------------------------------------------------

USE [AdventureWorks]

DECLARE @SomeSQLCode nvarchar(MAX)

,@AnalysedTable nvarchar(128)

,@AnalysedTableSchema nvarchar(128)

SET @AnalysedTable = 'SalesOrderDetail'

SET @AnalysedTableSchema = 'Sales' -- In some other cases, this will be set to 'dbo'

SET @SomeSQLCode = ''

SELECT@SomeSQLCode = @SomeSQLCode + ' ,MAX(ISNULL(datalength([' + COLUMN_NAME + ']),0)) AS ' + QUOTENAME(COLUMN_NAME) + CHAR(10)

FROMINFORMATION_SCHEMA.COLUMNS

WHERE[TABLE_NAME] = @AnalysedTable AND [TABLE_SCHEMA] = @AnalysedTableSchema

GROUP BY [ORDINAL_POSITION], [COLUMN_NAME]

ORDER BY [ORDINAL_POSITION], [COLUMN_NAME]

SELECT @SomeSQLCode = 'SELECT ' + NULLIF(SUBSTRING(@SomeSQLCode,3,8000),'') + ' FROM [' + @AnalysedTableSchema + '].[' + @AnalysedTable + ']'

PRINT @SomeSQLCode

EXEC sp_ExecuteSQL @SomeSQLCode

One could say that my "wish list" of objectives were not fully achieved, but at least I had fun doing that part.

After all, this code could serve as a basic start point to get all field size informations for all tables for all databases for all servers for all planets... (oups, dreaming again...)

Seriously, I did make sure to put the credits to Greg Larsen for the parts that inspired my own solution, because I think it is very important to do that every time we can, thus sending positive encouragements for good work.

Everyone likes positive messages from time to time.

And like what's going on SQL Server Central, "together" we can be better than alone.

Salutations,

Bruno Arnold

gcarbone-717246 Valued Member Points: 73 More actions · Answer 4

I'm trying to use the sql script to get table size with sql2005, but I'm get this error:

Incorrect syntax near 'physloc'.

I should modify the sql script to get the table size? and if so how?

thanks

RichB SSCrazy Eights Points: 9651 More actions · Answer 5

How about just using sysindexes or its modern equivalent? Sure it might be a bit dated, but broadly close enough.

That way you don't have to load your server runnning a count over all your tables (when you start hitting billion record tables that really stings...)

Much easier all round!

Sean Bowden Old Hand Points: 359 More actions · Answer 6

As Rich says there are faster/less resource intensive ways to get the data in question . The following script can be run at the database level to get row/page count for all user tables:

SELECT

OBJECT_NAME(a.object_id) [ObjectName],

SUM(row_count) [RowCount],

SUM(used_page_count) [PageCount]

FROM sys.dm_db_partition_stats a

INNER JOIN sys.objects b

ON a.object_id = b.object_id

WHERE a.index_id < 2

AND b.[type] = 'U'

AND b.is_ms_shipped = 0

GROUP BY OBJECT_NAME(a.object_id);

We're just interested in large tables with low row density so this should suffice for your purposes.

As for the earlier comment about insert performance, there is definitely an overhead when loading new data into compressed tables. Bulk loading can typically be optimized by using uncompressed heaps and compressing after the fact, but this of course depends on your environment/scenario. I would suggest reading the MSDN technical article called "The Data Loading Performance Guide" (http://msdn.microsoft.com/en-us/library/dd425070%28SQL.100%29.aspx) for more details.

I personally have nothing but good things to say about data compression, but as with everything SQL Server related, "it depends." If you typically observe high levels of CPU pressure then you should definitely tread lightly. However, for I/O bound environments the gains have always outweighed the losses in my experience, which is what I believe the author was trying to demonstrate (although his example didn't do enough to support this case).

Suresh Kumar Maganti Default port Points: 1429 More actions · Answer 7

For SQL Server 2005, the syntax would be:

select record_count / page_count

from sys.dm_db_index_physical_stats(392, 1598628738, null, Null, 'DETAILED')

where index_level = 0

The details are given in the same article as to its components.

The article also gives details on the equivalent SQL Server 2000 option.

Usage of "physloc" is specific to SQL Server 2008 as mentioned in the article.

Kindest Regards,

M Suresh Kumar

paul.millar SSC Veteran Points: 297 More actions · Answer 8

Hello all,

I found that number of columns further influenced the rows/page density thinking (sometimes for legacy support reasons we needed to keep an 'award-winningly' de-normalised table)

Small addition to SSC Journeyman's post:

select OBJECT_NAME(so.id) as tablename, rowcnt as row_count, dpages as page_count, rowcnt/dpages as rows_per_page, COUNT(sc.name) as num_columns

from sysindexes si left outer join sysobjects so on si.id = so.id left outer join syscolumns sc on si.id = sc.id

where si.indid = 1 and so.xtype = 'U' and rowcnt > 0

group by OBJECT_NAME(so.id), rowcnt, dpages, rowcnt/dpages

order by num_columns desc, rows_per_page

Thanks !

(this syntax is good for 2000, 2005, 2008)

Da Big Kahuna SSC Rookie Points: 35 More actions · Answer 9

Nice article, Suresh!

Give me a call some time,

-Steven

Boris G SSC Enthusiast Points: 152 More actions · Answer 10

How about using sp_estimate_data_compression_savings, for sql2008 and above.

Borik

Suresh Kumar Maganti Default port Points: 1429 More actions · Answer 11

Is it Steven Lugan? Thanks for your comments on the article. I don't have your number. I would be gald to call you or you can reach me at 214-300-8995.

Kindest Regards,

M Suresh Kumar

Suresh Kumar Maganti Default port Points: 1429 More actions · Answer 12

Hi Boris,

The system stored procedure, sp_estimate_data_compression_savings for Microsoft SQL Server 2008 has already been mentioned and used in the article to get an estimate of space saving after compression.

----Thanks and Best Regards,

Suresh.

Kindest Regards,

M Suresh Kumar