Column Store vs Row Store Indexes

Question

Post reply

Column Store vs Row Store Indexes

brian118

Hall of Fame

Points: 3506
More actions
May 14, 2015 at 12:07 am

#298149

Comments posted to this topic are about the item Column Store vs Row Store Indexes
Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246686 More actions · Answer 1

An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:

'FacTransaction_ColumnStore' – needs an extra 't'

'Compression ration' – ratio

'foorprint' – footprint

'datawarhouse' – missing 'e'

'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain

'proccessing' – too many c's

Not trying to detract from the article, just trying to help.

If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

brian118 Hall of Fame Points: 3506 More actions · Answer 2

Phil Parkin (5/14/2015)
An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:
'FacTransaction_ColumnStore' – needs an extra 't'
'Compression ration' – ratio
'foorprint' – footprint
'datawarhouse' – missing 'e'
'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain
'proccessing' – too many c's
Not trying to detract from the article, just trying to help.

Thanks Phil. I just submitted the new updates.

Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 3

Phil Parkin (5/14/2015)
An interesting read, thank you very much. Hope you don't mind me mentioning a few minor typos:
'FacTransaction_ColumnStore' – needs an extra 't'
'Compression ration' – ratio
'foorprint' – footprint
'datawarhouse' – missing 'e'
'which may be prohibited' – I think you meant 'prohibitive', but not 100% certain
'proccessing' – too many c's
Not trying to detract from the article, just trying to help.

OK since we are going that direction, just pulling from the "Conclusion" section here is what I noticed:

datawarhouse -- should be datawarehouse or perhaps data warehouse

would need to be rebuild every time -- rebuilt

which may be prohibited if -- prohibitive

a good partition startegy -- strategy

But the article was really good.

rschoenborn 38340 SSC Enthusiast Points: 128 More actions · Answer 4

One thing that is missing from your analysis is the type of Aggregation of the data in the data warehouse. Column Store indexes are mimicking the behavior of Key Value Pair (Casandra) or Column Store databases (InfiniDB/InfoBright). These are 5x faster than traditional RDBMS if and only if you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails.

Another consideration is that you need to compare the loading / Indexing time for inserting data. Yes, Column Store is useful because it is basically calculating the indexes all at once during creation using a bit mapped B-Tree. It knows how many of the distinct values will be in each extent and the ranges. Therefore, it can minimize the IO for retrieval. But the cost of building this up front can be high if you are working with streams or high number of new inserts in a short time. To get around this, the only way I have found to handle this is via Partitioning. It will drop the IO even more on retrieval.

Otherwise, it was a good analysis.

Regards, Rob

Alan Burstein SSC Guru Points: 61136 More actions · Answer 5

I just finished this article and learned a few things. Well done.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

brian118 Hall of Fame Points: 3506 More actions · Answer 6

Thanks to all for your comments.

Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -

lahteenj SSC Enthusiast Points: 180 More actions · Answer 7

Hi,

I can't find the scripts that you mentioned in the article?

Regards

Jari

brian118 Hall of Fame Points: 3506 More actions · Answer 8

lahteenj (5/15/2015)
Hi,
I can't find the scripts that you mentioned in the article?
Regards
Jari

I uploaded them again! Once the moderator will refresh the article, they should be there. Sorry for this

Regards
Brian Ellul
----------------------------------------------------------------------------------
Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination.
- Albert Einstein -

jfinerfrock-949145 SSC Rookie Points: 49 More actions · Answer 9

Grasshopper said "f you are looking at aggregating a single distinct metric. If you need to have many metrics (measures) in a row, then RDBMS is faster so calculations can be performed across the row in the Kernel. Row Calculations is where this technology fails."

>5 years ago we moved from SQL Server to Infobright. While there were several compromises that we had to take into consideration moving to a columnar database, calculations across rows was in no way one of them. I literally have hundreds of reports and most of them have at least 5 metrics. these reports typically run in < 4 seconds. Largest fact table approx 175 million rows.

Compromises that we had to take into consideration was that Insert Update and Delete were horribly long. We overcame this by utilizing drop and rebuild table or partitioning (In our BI Software not in the database).

One Huge advantage of the Columnar database was/is that we really do not have to be as selective of what we include in the table. if the table is 100 columns wide it is irrelevant because there is no IO overhead on columns not selected in the query. The other huge advantage of Columns is that I do 0 (Zero, Zilch, Nada) maintenance/performance tuning on the database. nice for a small shop!

We are, in fact, going to move back to SQL Server 2016 if the Column Store Indexes are as improved as the Marketing folks claim.

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 10

In SQL 2016 both clustered and non-clustered columnstore indexes are updateable, so there is no need to disable them prior to updates to the underlying table:

Columnstore Indexes Versioned Feature Summary

Thanks for the article, lots of interesting info in there!

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

brad.mason5 Hall of Fame Points: 3299 More actions · Answer 11

We have SQL 2016 and using column store indexes. You are spot on with seeks. We found that if a stored proc only wants a small range of values, a seek is preferred and much faster than scanning entire column store index. Aggregates work great for column store.

We recently implemented Niko's technique of clustered column store, but then add index views on top. This allows for seeks on b-tree if small range, but if we have query that needs to scan will hit column store. For this to work all the time, need to take a look at OPTION(RECOMPILE). But that hit for us is not bad so far. The other thing is once you add index views, updates/inserts/deletes will take longer on the table. If the table is constantly updated, will need to test to see if index views work for you.

http://www.nikoport.com/2016/10/10/columnstore-indexes-part-87-indexed-views/

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 12

akljfhnlaflkj

SSC Guru

Points: 76202

January 3, 2017 at 7:24 am

#1920367

Even though an old article, still a good read.

DeronDilger Valued Member Points: 55 More actions · Answer 13

I just got done reading Hugo Kornelis's Stairway to Columnstore Indexes (http://www.sqlservercentral.com/stairway/121631/ ) when the email nudge for this article (i.e. Column Store vs Row Store Indexes) showed up. It was a nice complement to Hugo's great overview (and his suggestions on optimizations).

Thanks, Brian!

takeiteasyn20 Grasshopper Points: 12 More actions · Answer 14

in the attached script of the article, we're creating a non-clustered column_store index on FactTransaction_RowStore table. Is this intended or typo? what scenario /test (out of 5) we see its application?

adding more:

is the script meant to be executed in one-go?

This reply was modified 3 years, 11 months ago by takeiteasyn20. Reason: added a question