SQL Server 2012 - Column Store Index

  • We've been doing some performance testing for Columnstore indexing using 160 mln rows, 400 mln rows with different kind of reporting queries. The purpose of test was, if it was really fast we could use them instead of building cubes.

    In Microsoft whitepapers it was stated that performance can go up to 100x but in reality it was not so impressive. In this link an sqlserver developer did some real testing http://www.youtube.com/watch?v=vPN8_PCsJm4. Which is around 19x times faster. Pay attention that he is comparing Columnstore index vs normal index's.

    In our environment we've compared columnstore index vs row data. Result: 2x to 6x maximum. Surprisingly it gets slower as well. Or sometimes it just doesn't use colunnstore index. Initially test was done in VM, and I though it's vm performance issue. We have tested in quite good physical server (with 24 cores and 128 GB), result was same. In attachment you can find some test results that we have done.

    And by the way for Jeff question, time of recreating Columnstore index is similar to Cube processing for same data.

  • Could you share the DDL and queries you used for the benchmarks?

  • Here you go:

    Creating table:

    ---==Create table

    CREATE TABLE [dbo].[MDCS_GPS_LOG_RECV](

    [Msg_ID] [varchar](50) NULL,

    [Callsign] [varchar](50) NULL,

    [Timestamp] [varchar](50) NULL,

    [Msg_Content] [varchar](2500) NULL

    ) ON [PRIMARY]

    GO

    Sample rows:

    INSERT INTO [dbo].[MDCS_GPS_LOG_RECV]

    ([Msg_ID]

    ,[Callsign]

    ,[Timestamp]

    ,[Msg_Content])

    VALUES

    ('S7343079B79D2','S734','2010-01-30 11:50:37','GMSGS734MSG=>RPV43123+0310594+1015965600000012<00009000IP=172.221.102.159MSGID=S7343079B79D21/30/2010 12:05:31 PM'),

    ('KL1703079B52A8','KL170','2010-01-30 11:50:37','GMSGKL170MSG=>RPV43124+0316425+1016904800000012<0000D002IP=172.221.101.209MSGID=KL1703079B52A81/30/2010 12:05:21 PM'),

    ('S3183079B74F9','S318','2010-01-30 11:50:38','GMSGS318MSG=>RPV43136+0312306+1017800800000012<00001000IP=172.220.2.132MSGID=S3183079B74F91/30/2010 12:05:30 PM');

    GO

    and query that was run to get "Count of Msg_ID by Callsign":

    select Count([Msg_ID]), Callsign from[dbo].[MDCS_GPS_LOG_RECV] group by Callsign

  • I'm not sure that's going to be a representative benchmark for columnstore indexes.

    The point of them is to support many thousands of variations of different combinations of predicates, group by's and aggregate functions that are applied to fact tables to support data warehouse queries. The amount of combinations usually makes it impossible to sensibly create regular indexes to support. A count across a single group by would represent a poor scenario for columnstore indexes winning out significantly, depending on data distribution. To really benchmark, the test would have to run a significant number of variations and have a realistic data distribution for a typical data warehouse.

    You may as well just create an indexed view to support the query used in the benchmark:

    CREATE VIEW [MDCS_GPS_LOG_RECV_INDEXED_VIEW] WITH SCHEMABINDING AS

    select COUNT_BIG([Msg_ID]) AS COUNT_Msg_ID,COUNT_BIG(*) AS COUNT, Callsign from[dbo].[MDCS_GPS_LOG_RECV] group by Callsign

    go

    CREATE UNIQUE CLUSTERED INDEX IDX_MDCS_GPS_LOG_RECV_INDEXED_VIEW ON [MDCS_GPS_LOG_RECV_INDEXED_VIEW](Callsign)

Viewing 4 posts - 16 through 18 (of 18 total)

You must be logged in to reply to this topic. Login to reply