Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Improving Database Design and Performance Using Compression Expand / Collapse
Author
Message
Posted Wednesday, November 17, 2010 10:05 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, December 5, 2014 1:50 PM
Points: 141, Visits: 115
Comments posted to this topic are about the item Improving Database Design and Performance Using Compression


Kindest Regards,

M Suresh Kumar

Post #1022614
Posted Thursday, November 18, 2010 4:35 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, December 7, 2010 2:34 AM
Points: 13, Visits: 31
Hi Suresh

Very interesting article, thank you. I wanted to clarify one point though. You talk about query performance improvement which, of course, is the holy grail with database applications, especially when performance starts slowing down with increased data volumes etc. However, you are also looking at table space optimisation and it appears to me that your article demonstrates the space saving more than performance optimisation.

I realise that you say query performance is improved due to faster joins with higher row density, but you don't show any evidence of that and your simple row count query took longer due to compression. So for optimum query performance would you recommend trying to achieve higher density without compression (by optimising datatypes etc.)?

Best wishes
John



Post #1022754
Posted Thursday, November 18, 2010 6:52 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: 2 days ago @ 9:24 AM
Points: 27, Visits: 193
Although I agree that saving space is "a good thing" (tm), I do wonder if the full impact of this operation is considered.

As far as changing an nvarchar to a varchar where functionally possible, hurray!

As far as PAGE compression goes... hmm
=> it's nice to see I/O comes down due to the more compact storage as IO often is a bottleneck on servers. (CPU's are cheap, memory and fast disks are 'relatively spoken' not) but then again, the actual execution times go up !?
=> what happens to INSERT/UPDATE/DELETE operations ? Won't they be affected more when using PAGE compression ?
=> what happens to queries that try to fetch data from NON-indexed columns. I know doing so is "bad", but it happens all the time, either because there is some WHERE part that affects an additional (non-index-included) field, or because the WHERE messes up the search-ability by putting some function around the fields.
=> adding more (compressed) indexes to cover all fields doesn't really sound like an option either as it will make the database bigger again and probably causes lots of side-effects when making changes to the data

I guess a lot of the actual gain depends on the way the data is being used;
* functionally (eg. only access via some procedures with "restricted" (and hence known) execution plans or more of the ad-hoc type of operations),
* technically (SELECT by day, update by night or a mixed bag 24/7 ?)
* and practically (3 vs 300 concurrent users)

Simply having a slimmer database does not necessary result in a fitter database IMHO, but I agree it might... I'm merely trying to be cautious here.



Post #1022863
Posted Thursday, November 18, 2010 7:34 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Monday, November 3, 2014 10:59 PM
Points: 148, Visits: 397
My interest was picked-up when I read your article about size optimization, and I very like your approach.

As a first time poster on this site, I am enthusiast as for sharing my case. This reply may be seen as a long one, if so go to "In short..."

In one contract few years ago, I had a desire to have a quick look of what table space usage could be optimized, but in a really more basic way than your approach.

I was curious to see how much wasted space was present with "over sized" string type fields for some big tables, but wanted to develop a simple query for that (yeah, one could say I was a little "naive" or "over optimistic" for my experience level at that time...).

Anyway, in the context that I was working from another one design, I wanted to have a quick look. I wished to design only one query if possible, even if it was not Christmas yet... (a dream cannot hurt, right ?)

As a discussion example for clarification here, one string field in a table could have been originally created with a 120 characters length in mind at design time, when in reality some long term usage shows the maximum length really "consumed" for all rows in this field is only 45 characters. Here, "long term" could be few years or few million rows, as you wish for your particular case.

This kind of information would then be for me a starting point to discuss or evaluate if it would be beneficial to reduce the length of this "longer-than-necessary" string field as a table size optimization point of view. In this example, one could decide to reduce the length field from 120 to 60 characters, for example.

In short, I was curious to find out the maximum field length usage for every string type fields at least in one table, and I began to search if someone else had created some tools or an article for that particular task. In the time I allowed myself, I find none. (yeah, one could say I could have invest more time, but I am probably a little "naive" after all...)

I was motivated to develop something basic but general purpose, small but giving efficient results. I did not want to have "Temp" tables or lots of steps, no really, maybe only one or a few very short queries was also in my design objectives (speaking "naive"...).

My first tries to develop something were not immediately successful, however with that pending desire in mind, I then fell on an article written by Grep Larsen showing me how I could have all the required information for one table, and my motivation was raised again.

If it does not hurt anyone here, I just want to say that this one particular inspiring article appeared in www.sqlserverexamples.com. Do not worry, I am still a very enthusiastic amateur of "SQL Server Central"... :)

Using this inspiration, I finally went with this approach at that time:

-----------------------------------------------------------------
-- Display the Maximum Usage Length for Each Column in a Table (a simple tool helping to optimize ANY existing table definition)
-- WRITTEN BY: BRUNO ARNOLD
-- DATE: 04/06/2008
--
-- Originally inspired by reading GREG LARSEN on this:
-- http://www.sqlserverexamples.com/v2/Examples/DynamicTSQL/DisplayingSmallestRowinTable/tabid/207/Default.aspx
--
-- I mainly use this to quickly see (on big tables) if a character field length
-- is really using (or needing) that range of field length, thus helping to spot possible space optimization.
--
-- FOR EXAMPLE:
-- In this example using the [SalesOrderDetail] table from [AdventureWorks], I "could securely" change
-- the [CarrierTrackingNumber] field's design definition to nvarchar(12) instead of nvarchar(25),
-- but ONLY if the future usage is not meant to be change, of course.
-- In this case, the select shows 24 bytes for this field, but it is a nvarchar type,
-- so it means you could use 12 characters in the "nvarchar" design definition.
--
-----------------------------------------------------------------

USE [AdventureWorks]

DECLARE @SomeSQLCode nvarchar(MAX)
,@AnalysedTable nvarchar(128)
,@AnalysedTableSchema nvarchar(128)

SET @AnalysedTable = 'SalesOrderDetail'
SET @AnalysedTableSchema = 'Sales' -- In some other cases, this will be set to 'dbo'
SET @SomeSQLCode = ''

SELECT @SomeSQLCode = @SomeSQLCode + ' ,MAX(ISNULL(datalength([' + COLUMN_NAME + ']),0)) AS ' + QUOTENAME(COLUMN_NAME) + CHAR(10)
FROM INFORMATION_SCHEMA.COLUMNS
WHERE [TABLE_NAME] = @AnalysedTable AND [TABLE_SCHEMA] = @AnalysedTableSchema
GROUP BY [ORDINAL_POSITION], [COLUMN_NAME]
ORDER BY [ORDINAL_POSITION], [COLUMN_NAME]

SELECT @SomeSQLCode = 'SELECT ' + NULLIF(SUBSTRING(@SomeSQLCode,3,8000),'') + ' FROM [' + @AnalysedTableSchema + '].[' + @AnalysedTable + ']'

PRINT @SomeSQLCode
EXEC sp_ExecuteSQL @SomeSQLCode



One could say that my "wish list" of objectives were not fully achieved, but at least I had fun doing that part.

After all, this code could serve as a basic start point to get all field size informations for all tables for all databases for all servers for all planets... (oups, dreaming again...)

Seriously, I did make sure to put the credits to Greg Larsen for the parts that inspired my own solution, because I think it is very important to do that every time we can, thus sending positive encouragements for good work.

Everyone likes positive messages from time to time.
And like what's going on SQL Server Central, "together" we can be better than alone.

Salutations,
Bruno Arnold
Post #1022896
Posted Thursday, November 18, 2010 8:11 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, September 18, 2014 2:22 AM
Points: 19, Visits: 54
I'm trying to use the sql script to get table size with sql2005, but I'm get this error:
Incorrect syntax near 'physloc'.
I should modify the sql script to get the table size? and if so how?

thanks
Post #1022939
Posted Thursday, November 18, 2010 10:08 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 8:05 AM
Points: 1,070, Visits: 913
How about just using sysindexes or its modern equivalent? Sure it might be a bit dated, but broadly close enough.

That way you don't have to load your server runnning a count over all your tables (when you start hitting billion record tables that really stings...)

Much easier all round!



Post #1023035
Posted Thursday, November 18, 2010 11:27 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, December 10, 2014 1:21 PM
Points: 91, Visits: 418
As Rich says there are faster/less resource intensive ways to get the data in question . The following script can be run at the database level to get row/page count for all user tables:

SELECT 
OBJECT_NAME(a.object_id) [ObjectName],
SUM(row_count) [RowCount],
SUM(used_page_count) [PageCount]
FROM sys.dm_db_partition_stats a
INNER JOIN sys.objects b
ON a.object_id = b.object_id
WHERE a.index_id < 2
AND b.[type] = 'U'
AND b.is_ms_shipped = 0
GROUP BY OBJECT_NAME(a.object_id);

We're just interested in large tables with low row density so this should suffice for your purposes.

As for the earlier comment about insert performance, there is definitely an overhead when loading new data into compressed tables. Bulk loading can typically be optimized by using uncompressed heaps and compressing after the fact, but this of course depends on your environment/scenario. I would suggest reading the MSDN technical article called "The Data Loading Performance Guide" (http://msdn.microsoft.com/en-us/library/dd425070%28SQL.100%29.aspx) for more details.

I personally have nothing but good things to say about data compression, but as with everything SQL Server related, "it depends." If you typically observe high levels of CPU pressure then you should definitely tread lightly. However, for I/O bound environments the gains have always outweighed the losses in my experience, which is what I believe the author was trying to demonstrate (although his example didn't do enough to support this case).
Post #1023077
Posted Thursday, November 18, 2010 3:47 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, December 5, 2014 1:50 PM
Points: 141, Visits: 115
For SQL Server 2005, the syntax would be:

select record_count / page_count
from sys.dm_db_index_physical_stats(392, 1598628738, null, Null, 'DETAILED')
where index_level = 0

The details are given in the same article as to its components.
The article also gives details on the equivalent SQL Server 2000 option.

Usage of "physloc" is specific to SQL Server 2008 as mentioned in the article.



Kindest Regards,

M Suresh Kumar

Post #1023208
Posted Monday, November 22, 2010 6:29 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Saturday, December 13, 2014 1:39 AM
Points: 63, Visits: 401
Hello all,

I found that number of columns further influenced the rows/page density thinking (sometimes for legacy support reasons we needed to keep an 'award-winningly' de-normalised table)

Small addition to SSC Journeyman's post:

select OBJECT_NAME(so.id) as tablename, rowcnt as row_count, dpages as page_count, rowcnt/dpages as rows_per_page, COUNT(sc.name) as num_columns
from sysindexes si left outer join sysobjects so on si.id = so.id left outer join syscolumns sc on si.id = sc.id
where si.indid = 1 and so.xtype = 'U' and rowcnt > 0
group by OBJECT_NAME(so.id), rowcnt, dpages, rowcnt/dpages
order by num_columns desc, rows_per_page

Thanks !

(this syntax is good for 2000, 2005, 2008)
Post #1024287
Posted Monday, December 6, 2010 3:00 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, December 6, 2010 3:02 PM
Points: 1, Visits: 4
Nice article, Suresh!
Give me a call some time,
-Steven
Post #1030870
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse