Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Improving Database Design and Performance Using Compression


Improving Database Design and Performance Using Compression

Author
Message
Suresh Kumar Maganti
Suresh Kumar Maganti
SSC-Enthusiastic
SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)

Group: General Forum Members
Points: 159 Visits: 125
Comments posted to this topic are about the item Improving Database Design and Performance Using Compression


Kindest Regards,

M Suresh Kumar

Johnc
Johnc
Grasshopper
Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)Grasshopper (13 reputation)

Group: General Forum Members
Points: 13 Visits: 31
Hi Suresh

Very interesting article, thank you. I wanted to clarify one point though. You talk about query performance improvement which, of course, is the holy grail with database applications, especially when performance starts slowing down with increased data volumes etc. However, you are also looking at table space optimisation and it appears to me that your article demonstrates the space saving more than performance optimisation.

I realise that you say query performance is improved due to faster joins with higher row density, but you don't show any evidence of that and your simple row count query took longer due to compression. So for optimum query performance would you recommend trying to achieve higher density without compression (by optimising datatypes etc.)?

Best wishes
John



deroby
deroby
Valued Member
Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)Valued Member (56 reputation)

Group: General Forum Members
Points: 56 Visits: 288
Although I agree that saving space is "a good thing" (tm), I do wonder if the full impact of this operation is considered.

As far as changing an nvarchar to a varchar where functionally possible, hurray!

As far as PAGE compression goes... hmm
=> it's nice to see I/O comes down due to the more compact storage as IO often is a bottleneck on servers. (CPU's are cheap, memory and fast disks are 'relatively spoken' not) but then again, the actual execution times go up !?
=> what happens to INSERT/UPDATE/DELETE operations ? Won't they be affected more when using PAGE compression ?
=> what happens to queries that try to fetch data from NON-indexed columns. I know doing so is "bad", but it happens all the time, either because there is some WHERE part that affects an additional (non-index-included) field, or because the WHERE messes up the search-ability by putting some function around the fields.
=> adding more (compressed) indexes to cover all fields doesn't really sound like an option either as it will make the database bigger again and probably causes lots of side-effects when making changes to the data

I guess a lot of the actual gain depends on the way the data is being used;
* functionally (eg. only access via some procedures with "restricted" (and hence known) execution plans or more of the ad-hoc type of operations),
* technically (SELECT by day, update by night or a mixed bag 24/7 ?)
* and practically (3 vs 300 concurrent users)

Simply having a slimmer database does not necessary result in a fitter database IMHO, but I agree it might... I'm merely trying to be cautious here.



barnold-629337
barnold-629337
SSC-Enthusiastic
SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)SSC-Enthusiastic (148 reputation)

Group: General Forum Members
Points: 148 Visits: 412
My interest was picked-up when I read your article about size optimization, and I very like your approach.

As a first time poster on this site, I am enthusiast as for sharing my case. This reply may be seen as a long one, if so go to "In short..."

In one contract few years ago, I had a desire to have a quick look of what table space usage could be optimized, but in a really more basic way than your approach.

I was curious to see how much wasted space was present with "over sized" string type fields for some big tables, but wanted to develop a simple query for that (yeah, one could say I was a little "naive" or "over optimistic" for my experience level at that time...).

Anyway, in the context that I was working from another one design, I wanted to have a quick look. I wished to design only one query if possible, even if it was not Christmas yet... (a dream cannot hurt, right ?)

As a discussion example for clarification here, one string field in a table could have been originally created with a 120 characters length in mind at design time, when in reality some long term usage shows the maximum length really "consumed" for all rows in this field is only 45 characters. Here, "long term" could be few years or few million rows, as you wish for your particular case.

This kind of information would then be for me a starting point to discuss or evaluate if it would be beneficial to reduce the length of this "longer-than-necessary" string field as a table size optimization point of view. In this example, one could decide to reduce the length field from 120 to 60 characters, for example.

In short, I was curious to find out the maximum field length usage for every string type fields at least in one table, and I began to search if someone else had created some tools or an article for that particular task. In the time I allowed myself, I find none. (yeah, one could say I could have invest more time, but I am probably a little "naive" after all...)

I was motivated to develop something basic but general purpose, small but giving efficient results. I did not want to have "Temp" tables or lots of steps, no really, maybe only one or a few very short queries was also in my design objectives (speaking "naive"...).

My first tries to develop something were not immediately successful, however with that pending desire in mind, I then fell on an article written by Grep Larsen showing me how I could have all the required information for one table, and my motivation was raised again.

If it does not hurt anyone here, I just want to say that this one particular inspiring article appeared in www.sqlserverexamples.com. Do not worry, I am still a very enthusiastic amateur of "SQL Server Central"... Smile

Using this inspiration, I finally went with this approach at that time:


-----------------------------------------------------------------
-- Display the Maximum Usage Length for Each Column in a Table (a simple tool helping to optimize ANY existing table definition)
-- WRITTEN BY: BRUNO ARNOLD
-- DATE: 04/06/2008
--
-- Originally inspired by reading GREG LARSEN on this:
-- http://www.sqlserverexamples.com/v2/Examples/DynamicTSQL/DisplayingSmallestRowinTable/tabid/207/Default.aspx
--
-- I mainly use this to quickly see (on big tables) if a character field length
-- is really using (or needing) that range of field length, thus helping to spot possible space optimization.
--
-- FOR EXAMPLE:
-- In this example using the [SalesOrderDetail] table from [AdventureWorks], I "could securely" change
-- the [CarrierTrackingNumber] field's design definition to nvarchar(12) instead of nvarchar(25),
-- but ONLY if the future usage is not meant to be change, of course.
-- In this case, the select shows 24 bytes for this field, but it is a nvarchar type,
-- so it means you could use 12 characters in the "nvarchar" design definition.
--
-----------------------------------------------------------------

USE [AdventureWorks]

DECLARE @SomeSQLCode nvarchar(MAX)
,@AnalysedTable nvarchar(128)
,@AnalysedTableSchema nvarchar(128)

SET @AnalysedTable = 'SalesOrderDetail'
SET @AnalysedTableSchema = 'Sales' -- In some other cases, this will be set to 'dbo'
SET @SomeSQLCode = ''

SELECT @SomeSQLCode = @SomeSQLCode + ' ,MAX(ISNULL(datalength([' + COLUMN_NAME + ']),0)) AS ' + QUOTENAME(COLUMN_NAME) + CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE [TABLE_NAME] = @AnalysedTable AND [TABLE_SCHEMA] = @AnalysedTableSchema
GROUP BY [ORDINAL_POSITION], [COLUMN_NAME]
ORDER BY [ORDINAL_POSITION], [COLUMN_NAME]

SELECT @SomeSQLCode = 'SELECT ' + NULLIF(SUBSTRING(@SomeSQLCode,3,8000),'') + ' FROM [' + @AnalysedTableSchema + '].[' + @AnalysedTable + ']'

PRINT @SomeSQLCode
EXEC sp_ExecuteSQL @SomeSQLCode





One could say that my "wish list" of objectives were not fully achieved, but at least I had fun doing that part.

After all, this code could serve as a basic start point to get all field size informations for all tables for all databases for all servers for all planets... (oups, dreaming again...)

Seriously, I did make sure to put the credits to Greg Larsen for the parts that inspired my own solution, because I think it is very important to do that every time we can, thus sending positive encouragements for good work.

Everyone likes positive messages from time to time.
And like what's going on SQL Server Central, "together" we can be better than alone.

Salutations,
Bruno Arnold
gcarbone-717246
gcarbone-717246
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 55
I'm trying to use the sql script to get table size with sql2005, but I'm get this error:
Incorrect syntax near 'physloc'.
I should modify the sql script to get the table size? and if so how?

thanks
RichB
RichB
Ten Centuries
Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)Ten Centuries (1.1K reputation)

Group: General Forum Members
Points: 1102 Visits: 1015
How about just using sysindexes or its modern equivalent? Sure it might be a bit dated, but broadly close enough.

That way you don't have to load your server runnning a count over all your tables (when you start hitting billion record tables that really stings...)

Much easier all round!



Sean Bowden
Sean Bowden
SSC Journeyman
SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)SSC Journeyman (97 reputation)

Group: General Forum Members
Points: 97 Visits: 447
As Rich says there are faster/less resource intensive ways to get the data in question . The following script can be run at the database level to get row/page count for all user tables:

SELECT 
OBJECT_NAME(a.object_id) [ObjectName],
SUM(row_count) [RowCount],
SUM(used_page_count) [PageCount]
FROM sys.dm_db_partition_stats a
INNER JOIN sys.objects b
ON a.object_id = b.object_id
WHERE a.index_id < 2
AND b.[type] = 'U'
AND b.is_ms_shipped = 0
GROUP BY OBJECT_NAME(a.object_id);



We're just interested in large tables with low row density so this should suffice for your purposes.

As for the earlier comment about insert performance, there is definitely an overhead when loading new data into compressed tables. Bulk loading can typically be optimized by using uncompressed heaps and compressing after the fact, but this of course depends on your environment/scenario. I would suggest reading the MSDN technical article called "The Data Loading Performance Guide" (http://msdn.microsoft.com/en-us/library/dd425070%28SQL.100%29.aspx) for more details.

I personally have nothing but good things to say about data compression, but as with everything SQL Server related, "it depends." If you typically observe high levels of CPU pressure then you should definitely tread lightly. However, for I/O bound environments the gains have always outweighed the losses in my experience, which is what I believe the author was trying to demonstrate (although his example didn't do enough to support this case).
Suresh Kumar Maganti
Suresh Kumar Maganti
SSC-Enthusiastic
SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)SSC-Enthusiastic (159 reputation)

Group: General Forum Members
Points: 159 Visits: 125
For SQL Server 2005, the syntax would be:

select record_count / page_count
from sys.dm_db_index_physical_stats(392, 1598628738, null, Null, 'DETAILED')
where index_level = 0

The details are given in the same article as to its components.
The article also gives details on the equivalent SQL Server 2000 option.

Usage of "physloc" is specific to SQL Server 2008 as mentioned in the article.


Kindest Regards,

M Suresh Kumar

paul.millar
paul.millar
SSC Journeyman
SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)SSC Journeyman (81 reputation)

Group: General Forum Members
Points: 81 Visits: 473
Hello all,

I found that number of columns further influenced the rows/page density thinking (sometimes for legacy support reasons we needed to keep an 'award-winningly' de-normalised table)

Small addition to SSC Journeyman's post:

select OBJECT_NAME(so.id) as tablename, rowcnt as row_count, dpages as page_count, rowcnt/dpages as rows_per_page, COUNT(sc.name) as num_columns
from sysindexes si left outer join sysobjects so on si.id = so.id left outer join syscolumns sc on si.id = sc.id
where si.indid = 1 and so.xtype = 'U' and rowcnt > 0
group by OBJECT_NAME(so.id), rowcnt, dpages, rowcnt/dpages
order by num_columns desc, rows_per_page

Thanks !

(this syntax is good for 2000, 2005, 2008)
Da Big Kahuna
Da Big Kahuna
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 4
Nice article, Suresh!
Give me a call some time,
-Steven
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search