Column vs Row

  • Comments posted to this topic are about the item Column vs Row

  • This was removed by the editor as SPAM

  • I hope there are people out there interested in columnstore. It is a powerful indexing strategy that can give great performance. (Spoiler alert, this is a long post.)

    We make good use of columnstore where I work, both for BI tables and for OLTP. However, these two use cases tackle different problems so they will be discussed separately.

    Clustered Columnstore for BI Tables

    Our license covers Enterprise Edition with Service Assurance, so we have SQL2019 installed with PowerBI Server in place of SSRS, and we are moving down the path of replacing and enhancing traditional row/column reports with PowerBI visualisations that run on SSAS Tabular Mode. Our build has the following features:

    * Our work to date covers End of Day processing and reporting

    * We use Live Connect in our SSAS Tabular Model

    * All underlying tables are refreshed with an overnight ETL process

    * All underlying tables are created using a clustered columnstore index (CCI). A few also have a row-format index

    * Visualisation start-up is typically under 2 seconds

    * Visualisation filtering typically has sub-second response

    We see the advantages of this appropach as:

    1) Our PowerBI developers are becoming used to the techniques we will need when we move on to querying live OLTP data

    2) Refresh of the model structure typically takes under 2 seconds, no post-processing is required

    3) We eliminate all the SSAS data load time and memory usage that comes with Import mode

    4) Performance of visualisations is as good as when using Import mode

    However, there are disadvantages:

    a) Calaculated columns are not possible

    b) All table joins can only be single-column

    c) Multiple-column joins have to be pushed down into the data layer to give single column with compound values

    We fairly regularly review the Live v Import decision, but we know we will need to build visualisations looking at live OLTP data. There is a different mindset needed when using Live compared to Import when designing visualisations. So far we have not come across anything we cannot solve with a Live connection so we remain happy with this approach.

    Another important change with CCIs is the exploitation of only storing unique values once. For example we have a number of Dimension tables that hold a Status value. With Row-format it would be normal and often more efficient to use a separate look-up table to store the Status Description, but with columnstore all descriptions are held in the same table that needs the code value. This also extends to Fact tables, which are far wider than would normally be the case with a row-format model. It remains very important to do TNF analysis of the data to know how it all relates, but physical table design tries to use whatever denormalisation that is needed to avoid joins.

    We also remain aware that very frequent queries based on a given key will often perform faster with a row-format index than a columnstore scan with predicate pushdown. For most of the visualisations, the slicing and dicing has no consistent pattern, but where a significant pattern does exist we will look at the performance, etc, impact of a row-format index. There is a trade-off between the superb general-purpose CCI and a shedload of row-format indexes that may go for hours or more without being used.

    We occasionally hit problems with CCIs, and have developed workrounds. Sometimes the optimiser refuses to do predicate push down for a particular query, instead doing a table scan followed by a filter. This is a performance killer, and the normal workround is to create a suitable row-format index. However, experience to date is that using Live connect based on tables with CCIs is a very effective way to do BI.

    Non-Clustered Columnstore for OLTP

    With a row-format index, an index scan will read the entire index then filter out the rows needed. With a multi-million row table the scan will take more time than we wish to complete. It is obviously possible to keep adding row-format indexes with the aim of turning scans to seeks, but this results in an OLTP table where its performance is burdened by a large number of indexes.

    Non-Clustered Columnstore Indexes (NCCI) can cut through this, replacing a large number of low use row-format indexes with a single columnstore index. Most of our scans now take place via an NCCI that uses predicate pushdown. Multi-second (or minute) queries are now often sub-second in response. An NCCI is almost like a magic bullet, but it sometimes will head for your foot.

    The biggest issue with a NCCI on OLTP tables is volatility. Every insert/update/delete gets put in an Open Row Group, which gives a portion of the data that effectively is subject to a scan, while the unchanged data gets the magic columnstore predicate pushdown. The answer we use is regular maintenance of NCCI indexes. We have a job that runs every 2 hours and checks the stats for every NCCI we have. Anything that has gone above our thresholds gets processed to close the open row group and the action logged. If a table needs 3 or less maintenance actions per day then a NCCI is undoubtedly right for that table. If it needs maintenance just about every time the process runs then it is almost certainly too volatile for an NCCI. With over 200 tables where a NCCI has been used, less than 5 have proved too volatile and the NCCI dropped. Another issue is data types such a (max) that cannot be included in an NCCI. Our standard means these data types are virtually never included in row-format indexes, so we don't think we have lost much in this area.

    As with the BI tables, we are aware that a row-format index can perform better for very frequent queries where the predicate keys are clearly defined, so some of our tables with an NCCI have retained a few row-format indexes. There is also the very infrequent problem where the optimiser just prefers a scan followed by a filter. However, for a significant number of tables our indexing is now a clustered primary key in row-format and an NCCI.

    My recommentation is to try these out.   There will be things you need to learn, but there is also a lot of performance to gain.

    • This reply was modified 1 year, 9 months ago by  EdVassie.
    • This reply was modified 1 year, 9 months ago by  EdVassie.

    Original author: 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply