Column Store vs Row Store Indexes

  • Comments posted to this topic are about the item Column Store vs Row Store Indexes

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

    'FacTransaction_ColumnStore' – needs an extra 't'

    'Compression ration' – ratio

    'foorprint' – footprint

    'datawarhouse' – missing 'e'

    'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

    'proccessing' – too many c's

    Not trying to detract from the article, just trying to help.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (5/14/2015)


    An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

    'FacTransaction_ColumnStore' – needs an extra 't'

    'Compression ration' – ratio

    'foorprint' – footprint

    'datawarhouse' – missing 'e'

    'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

    'proccessing' – too many c's

    Not trying to detract from the article, just trying to help.

    Thanks Phil. I just submitted the new updates.

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • Phil Parkin (5/14/2015)


    An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

    'FacTransaction_ColumnStore' – needs an extra 't'

    'Compression ration' – ratio

    'foorprint' – footprint

    'datawarhouse' – missing 'e'

    'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

    'proccessing' – too many c's

    Not trying to detract from the article, just trying to help.

    OK since we are going that direction, just pulling from the "Conclusion" section here is what I noticed:

    datawarhouse -- should be datawarehouse or perhaps data warehouse

    would need to be rebuild every time -- rebuilt

    which may be prohibited if -- prohibitive

    a good partition startegy -- strategy

    But the article was really good.

  • One thing that is missing from your analysis is the type of Aggregation of the data in the data warehouse. Column Store indexes are mimicking the behavior of Key Value Pair (Casandra) or Column Store databases (InfiniDB/InfoBright). These are 5x faster than traditional RDBMS if and only if you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails.

    Another consideration is that you need to compare the loading / Indexing time for inserting data. Yes, Column Store is useful because it is basically calculating the indexes all at once during creation using a bit mapped B-Tree. It knows how many of the distinct values will be in each extent and the ranges. Therefore, it can minimize the IO for retrieval. But the cost of building this up front can be high if you are working with streams or high number of new inserts in a short time. To get around this, the only way I have found to handle this is via Partitioning. It will drop the IO even more on retrieval.

    Otherwise, it was a good analysis.

    Regards, Rob

  • I just finished this article and learned a few things. Well done.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks to all for your comments.

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • Hi,

    I can't find the scripts that you mentioned in the article?

    Regards

    Jari

  • lahteenj (5/15/2015)


    Hi,

    I can't find the scripts that you mentioned in the article?

    Regards

    Jari

    I uploaded them again! Once the moderator will refresh the article, they should be there. Sorry for this

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • Grasshopper said "f you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails."

    >5 years ago we moved from SQL Server to Infobright. While there were several compromises that we had to take into consideration moving to a columnar database, calculations across rows was in no way one of them. I literally have hundreds of reports and most of them have at least 5 metrics. these reports typically run in < 4 seconds. Largest fact table approx 175 million rows.

    Compromises that we had to take into consideration was that Insert Update and Delete were horribly long. We overcame this by utilizing drop and rebuild table or partitioning (In our BI Software not in the database).

    One Huge advantage of the Columnar database was/is that we really do not have to be as selective of what we include in the table. if the table is 100 columns wide it is irrelevant because there is no IO overhead on columns not selected in the query. The other huge advantage of Columns is that I do 0 (Zero, Zilch, Nada) maintenance/performance tuning on the database. nice for a small shop!

    We are, in fact, going to move back to SQL Server 2016 if the Column Store Indexes are as improved as the Marketing folks claim.

  • In SQL 2016 both clustered and non-clustered columnstore indexes are updateable, so there is no need to disable them prior to updates to the underlying table:

    Columnstore Indexes Versioned Feature Summary

    Thanks for the article, lots of interesting info in there!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • We have SQL 2016 and using column store indexes. You are spot on with seeks. We found that if a stored proc only wants a small range of values, a seek is preferred and much faster than scanning entire column store index. Aggregates work great for column store.

    We recently implemented Niko's technique of clustered column store, but then add index views on top. This allows for seeks on b-tree if small range, but if we have query that needs to scan will hit column store. For this to work all the time, need to take a look at OPTION(RECOMPILE). But that hit for us is not bad so far. The other thing is once you add index views, updates/inserts/deletes will take longer on the table. If the table is constantly updated, will need to test to see if index views work for you.

    http://www.nikoport.com/2016/10/10/columnstore-indexes-part-87-indexed-views/

  • Even though an old article, still a good read.

  • I just got done reading Hugo Kornelis's Stairway to Columnstore Indexes (http://www.sqlservercentral.com/stairway/121631/ ) when the email nudge for this article (i.e. Column Store vs Row Store Indexes) showed up.  It was a nice complement to Hugo's great overview (and his suggestions on optimizations).

    Thanks, Brian!

  • in the attached script of the article, we're creating a non-clustered column_store index on FactTransaction_RowStore table. Is this intended or typo? what scenario /test (out of 5) we see its application?

    adding more:

    is the script meant to be executed in one-go?

    • This reply was modified 3 years, 11 months ago by  takeiteasyn20. Reason: added a question

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply