Column Store vs Row Store Indexes

  • brian118

    Hall of Fame

    Points: 3506

    Comments posted to this topic are about the item Column Store vs Row Store Indexes

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • Phil Parkin

    SSC Guru

    Points: 244426

    An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

    'FacTransaction_ColumnStore' – needs an extra 't'

    'Compression ration' – ratio

    'foorprint' – footprint

    'datawarhouse' – missing 'e'

    'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

    'proccessing' – too many c's

    Not trying to detract from the article, just trying to help.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • brian118

    Hall of Fame

    Points: 3506

    Phil Parkin (5/14/2015)


    An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

    'FacTransaction_ColumnStore' – needs an extra 't'

    'Compression ration' – ratio

    'foorprint' – footprint

    'datawarhouse' – missing 'e'

    'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

    'proccessing' – too many c's

    Not trying to detract from the article, just trying to help.

    Thanks Phil. I just submitted the new updates.

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Phil Parkin (5/14/2015)


    An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

    'FacTransaction_ColumnStore' – needs an extra 't'

    'Compression ration' – ratio

    'foorprint' – footprint

    'datawarhouse' – missing 'e'

    'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

    'proccessing' – too many c's

    Not trying to detract from the article, just trying to help.

    OK since we are going that direction, just pulling from the "Conclusion" section here is what I noticed:

    datawarhouse -- should be datawarehouse or perhaps data warehouse

    would need to be rebuild every time -- rebuilt

    which may be prohibited if -- prohibitive

    a good partition startegy -- strategy

    But the article was really good.

  • rschoenborn 38340

    SSC Enthusiast

    Points: 128

    One thing that is missing from your analysis is the type of Aggregation of the data in the data warehouse. Column Store indexes are mimicking the behavior of Key Value Pair (Casandra) or Column Store databases (InfiniDB/InfoBright). These are 5x faster than traditional RDBMS if and only if you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails.

    Another consideration is that you need to compare the loading / Indexing time for inserting data. Yes, Column Store is useful because it is basically calculating the indexes all at once during creation using a bit mapped B-Tree. It knows how many of the distinct values will be in each extent and the ranges. Therefore, it can minimize the IO for retrieval. But the cost of building this up front can be high if you are working with streams or high number of new inserts in a short time. To get around this, the only way I have found to handle this is via Partitioning. It will drop the IO even more on retrieval.

    Otherwise, it was a good analysis.

    Regards, Rob

  • Alan Burstein

    SSC Guru

    Points: 61079

    I just finished this article and learned a few things. Well done.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • brian118

    Hall of Fame

    Points: 3506

    Thanks to all for your comments.

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • lahteenj

    SSC Enthusiast

    Points: 150

    Hi,

    I can't find the scripts that you mentioned in the article?

    Regards

    Jari

  • brian118

    Hall of Fame

    Points: 3506

    lahteenj (5/15/2015)


    Hi,

    I can't find the scripts that you mentioned in the article?

    Regards

    Jari

    I uploaded them again! Once the moderator will refresh the article, they should be there. Sorry for this

    Regards
    Brian Ellul
    ----------------------------------------------------------------------------------
    Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
    - Albert Einstein -

  • jfinerfrock-949145

    SSC Rookie

    Points: 49

    Grasshopper said "f you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails."

    >5 years ago we moved from SQL Server to Infobright. While there were several compromises that we had to take into consideration moving to a columnar database, calculations across rows was in no way one of them. I literally have hundreds of reports and most of them have at least 5 metrics. these reports typically run in < 4 seconds. Largest fact table approx 175 million rows.

    Compromises that we had to take into consideration was that Insert Update and Delete were horribly long. We overcame this by utilizing drop and rebuild table or partitioning (In our BI Software not in the database).

    One Huge advantage of the Columnar database was/is that we really do not have to be as selective of what we include in the table. if the table is 100 columns wide it is irrelevant because there is no IO overhead on columns not selected in the query. The other huge advantage of Columns is that I do 0 (Zero, Zilch, Nada) maintenance/performance tuning on the database. nice for a small shop!

    We are, in fact, going to move back to SQL Server 2016 if the Column Store Indexes are as improved as the Marketing folks claim.

  • Marios Philippopoulos

    SSC Guru

    Points: 57030

    In SQL 2016 both clustered and non-clustered columnstore indexes are updateable, so there is no need to disable them prior to updates to the underlying table:

    Columnstore Indexes Versioned Feature Summary

    Thanks for the article, lots of interesting info in there!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • brad.mason5

    Hall of Fame

    Points: 3299

    We have SQL 2016 and using column store indexes. You are spot on with seeks. We found that if a stored proc only wants a small range of values, a seek is preferred and much faster than scanning entire column store index. Aggregates work great for column store.

    We recently implemented Niko's technique of clustered column store, but then add index views on top. This allows for seeks on b-tree if small range, but if we have query that needs to scan will hit column store. For this to work all the time, need to take a look at OPTION(RECOMPILE). But that hit for us is not bad so far. The other thing is once you add index views, updates/inserts/deletes will take longer on the table. If the table is constantly updated, will need to test to see if index views work for you.

    http://www.nikoport.com/2016/10/10/columnstore-indexes-part-87-indexed-views/

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Even though an old article, still a good read.

  • DeronDilger

    SSC Rookie

    Points: 43

    I just got done reading Hugo Kornelis's Stairway to Columnstore Indexes (http://www.sqlservercentral.com/stairway/121631/ ) when the email nudge for this article (i.e. Column Store vs Row Store Indexes) showed up.  It was a nice complement to Hugo's great overview (and his suggestions on optimizations).

    Thanks, Brian!

  • takeiteasyn20

    Grasshopper

    Points: 12

    in the attached script of the article, we're creating a non-clustered column_store index on FactTransaction_RowStore table. Is this intended or typo? what scenario /test (out of 5) we see its application?

    adding more:

    is the script meant to be executed in one-go?

    • This reply was modified 5 days, 5 hours ago by  takeiteasyn20. Reason: added a question

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply