Data compression

  • Dear all,

    We have a BI solution

    In this solution we have stading area, a data mart and a cube.

    We are thinnking in implementing data compression.

    We will use columnstore indexes and I would like to ask, if we gain anything to apply data compression (page or row level) to all the columns, including the ones that will have column index.

    Can someone help on this?

    Thanks you,

    Pedro 

  • What do you mean by "gain"? If you compress something (in data terms), you "gain" storage space. Is that what you mean?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thank you for your  reply. We have a bi application that takes information from views that that exist in an oracle db.  We get this information into our side into our staging area using ssis then we use ssis again to put the information from staging into our data Mart. Then we process our cube which is tabular made.  Our Oracle data source will start to send us millions of rows of information. One of step we are going for performance is use columnstors index. Question would be, I believe that every table that is suffer lots of inserts and deletes and not much updates is a great candidate to be compressed at row or page level.  Not only from a storage perspective but also from a performance perspective. But we will use also columnstoredindex. On tables that use columnindex, do we still gain to have data compression on all it's field, even on the columns that are column indexed? Or I should not compress tables ( at row or page level) if they have columnstors index?

  • Thank you for your support.

  • river1 - Wednesday, July 4, 2018 12:42 PM

    Thank you for your  reply. We have a bi application that takes information from views that that exist in an oracle db.  We get this information into our side into our staging area using ssis then we use ssis again to put the information from staging into our data Mart. Then we process our cube which is tabular made.  Our Oracle data source will start to send us millions of rows of information. One of step we are going for performance is use columnstors index. Question would be, I believe that every table that is suffer lots of inserts and deletes and not much updates is a great candidate to be compressed at row or page level.  Not only from a storage perspective but also from a performance perspective. But we will use also columnstoredindex. On tables that use columnindex, do we still gain to have data compression on all it's field, even on the columns that are column indexed? Or I should not compress tables ( at row or page level) if they have columnstors index?

    I guess my take on this would be to do the aggregations for your datamart in the native environment for the raw data and then only store the results you need in your datamart.  In this case, that would mean writing code in Oracle and then reading just those results using SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • river1 - Wednesday, July 4, 2018 7:27 AM

    Dear all,

    We have a BI solution

    In this solution we have stading area, a data mart and a cube.

    We are thinnking in implementing data compression.

    We will use columnstore indexes and I would like to ask, if we gain anything to apply data compression (page or row level) to all the columns, including the ones that will have column index.

    Can someone help on this?

    Thanks you,

    Pedro 

    Data compression will most likely be beneficial as the compressed pages are also compressed in memory, which means that more data pages will fit into memory. The columnstore indexes use columnstore compression and are not affected.
    😎

  • Thank you very much. Let me just add another point. We are using tabular model and we expect to have millions and millions of rows on the facts. I don't know if this will not enforce even more the need of a data page compression asside the fact that we will have columnstoredindexes.

  • Tabular and columnstore by default use compression - and you can't really change it the same way you do the normal data - columnstore you can define one of COLUMNSTORE or COLUMNSTORE_ARCHIVE - read the manuals to figure out what they are/do.

    What you may consider for these columns is look at its cardinality and data type and eventually split the columns into 2-3 or even more parts.

    For example if you have a datetime column with the time portion populated splitting it into 2 columns , one for date part, the other for time part will significantly improve its compression on tabular/columnstore mode.

    see https://www.sqlbi.com/articles/optimizing-high-cardinality-columns-in-vertipaq/ for further examples and explanation. 

    On tabular processing there are other things you can do to improve compression - some are on the example above, others are related to segment processing size - but this comes at a cost on processing time so needs to be tested and determine what is the best process point.

  • Sorry this last answer I did not understood.

    I got this reply in my company:

    "On a object you can setup eithercolumnstore or page/row compression. Therefore you cannot technically have thesame table stored in CCI and at the same time use page/row compression."

    I thought that on one object (table) I could set columnstore and page compreesion....

  • Hi,

    Some options for compression are still available with columnstore tables.  See https://docs.microsoft.com/en-us/sql/relational-databases/data-compression/data-compression?view=sql-server-2017#using-columnstore-and-columnstore-archive-compression

    The stored procedure sp_estimate_data_compression_savings may also be useful for estimating what compression will do across different tables.

  • I'm about a week from compressing a 1.4 TeraByte database to about 400GB and a 1 Terabyte database to slightly less than that... and I'm not using CCI and I'm not using row or page compression.  It's called a "data cleanup", which no one seems to actually do anymore. 😉  Heh... even crap code is going to run about twice as fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply