Data compression

Question

Post reply

Data compression

river1

SSC-Insane

Points: 24302
More actions
July 4, 2018 at 7:27 am

#353901

Dear all,
We have a BI solution
In this solution we have stading area, a data mart and a cube.
We are thinnking in implementing data compression.
We will use columnstore indexes and I would like to ask, if we gain anything to apply data compression (page or row level) to all the columns, including the ones that will have column index.
Can someone help on this?
Thanks you,
Pedro

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99126 More actions · Answer 1

What do you mean by "gain"? If you compress something (in data terms), you "gain" storage space. Is that what you mean?

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

river1 SSC-Insane Points: 24302 More actions · Answer 2

Thank you for your reply. We have a bi application that takes information from views that that exist in an oracle db. We get this information into our side into our staging area using ssis then we use ssis again to put the information from staging into our data Mart. Then we process our cube which is tabular made. Our Oracle data source will start to send us millions of rows of information. One of step we are going for performance is use columnstors index. Question would be, I believe that every table that is suffer lots of inserts and deletes and not much updates is a great candidate to be compressed at row or page level. Not only from a storage perspective but also from a performance perspective. But we will use also columnstoredindex. On tables that use columnindex, do we still gain to have data compression on all it's field, even on the columns that are column indexed? Or I should not compress tables ( at row or page level) if they have columnstors index?

river1 SSC-Insane Points: 24302 More actions · Answer 3

river1

SSC-Insane

Points: 24302

July 4, 2018 at 12:43 pm

#1996319

Thank you for your support.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 4

river1 - Wednesday, July 4, 2018 12:42 PM
Thank you for your reply. We have a bi application that takes information from views that that exist in an oracle db. We get this information into our side into our staging area using ssis then we use ssis again to put the information from staging into our data Mart. Then we process our cube which is tabular made. Our Oracle data source will start to send us millions of rows of information. One of step we are going for performance is use columnstors index. Question would be, I believe that every table that is suffer lots of inserts and deletes and not much updates is a great candidate to be compressed at row or page level. Not only from a storage perspective but also from a performance perspective. But we will use also columnstoredindex. On tables that use columnindex, do we still gain to have data compression on all it's field, even on the columns that are column indexed? Or I should not compress tables ( at row or page level) if they have columnstors index?

I guess my take on this would be to do the aggregations for your datamart in the native environment for the raw data and then only store the results you need in your datamart. In this case, that would mean writing code in Oracle and then reading just those results using SQL Server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 5

river1 - Wednesday, July 4, 2018 7:27 AM
Dear all,
We have a BI solution
In this solution we have stading area, a data mart and a cube.
We are thinnking in implementing data compression.
We will use columnstore indexes and I would like to ask, if we gain anything to apply data compression (page or row level) to all the columns, including the ones that will have column index.
Can someone help on this?
Thanks you,
Pedro

Data compression will most likely be beneficial as the compressed pages are also compressed in memory, which means that more data pages will fit into memory. The columnstore indexes use columnstore compression and are not affected.
😎

river1 SSC-Insane Points: 24302 More actions · Answer 6

Thank you very much. Let me just add another point. We are using tabular model and we expect to have millions and millions of rows on the facts. I don't know if this will not enforce even more the need of a data page compression asside the fact that we will have columnstoredindexes.

frederico_fonseca SSCoach Points: 16014 More actions · Answer 7

Tabular and columnstore by default use compression - and you can't really change it the same way you do the normal data - columnstore you can define one of COLUMNSTORE or COLUMNSTORE_ARCHIVE - read the manuals to figure out what they are/do.

What you may consider for these columns is look at its cardinality and data type and eventually split the columns into 2-3 or even more parts.

For example if you have a datetime column with the time portion populated splitting it into 2 columns , one for date part, the other for time part will significantly improve its compression on tabular/columnstore mode.

see https://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/ for further examples and explanation.

On tabular processing there are other things you can do to improve compression - some are on the example above, others are related to segment processing size - but this comes at a cost on processing time so needs to be tested and determine what is the best process point.

river1 SSC-Insane Points: 24302 More actions · Answer 8

Sorry this last answer I did not understood.

I got this reply in my company:

"On a object you can setup eithercolumnstore or page/row compression. Therefore you cannot technically have thesame table stored in CCI and at the same time use page/row compression."

I thought that on one object (table) I could set columnstore and page compreesion....

Bevan Keighley SSCommitted Points: 1825 More actions · Answer 9

Hi,

Some options for compression are still available with columnstore tables. See https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression

The stored procedure sp_estimate_data_compression_savings may also be useful for estimating what compression will do across different tables.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

I'm about a week from compressing a 1.4 TeraByte database to about 400GB and a 1 Terabyte database to slightly less than that... and I'm not using CCI and I'm not using row or page compression. It's called a "data cleanup", which no one seems to actually do anymore. 😉 Heh... even crap code is going to run about twice as fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)