Does sparse columns really work? or at least, how does it work?

  • Hello to all

    I was doing some queries to check if i could get any advantage using sparse columns. But i am confuse with my results. Here is what i did

    1 - Determinate the % of null values per column

  • ColumnNameTableNameNullValuesNumberOfRowsDataTypeNULL%SugestedTotalNull%
    ActionLogNumberJob185567185567varchar0.61.00
    DetailOutersOverJob132086185567int0.640.71
    DetailOutersShortJob184011185567int0.640.99
    GrnNumberJob185567185567varchar0.61.00
    GrnRefusedDescJob185567185567varchar0.61.00
    GrnRefusedReasonJob185567185567varchar0.61.00
    OuterCountJob124926185567int0.640.67
    ProofOfDeliveryJob177240185567int0.640.96
    RoyaltyCodeDescJob149383185567varchar0.60.81
    TotalOutersOverJob132038185567int0.640.71
    TotalOutersShortJob125175185567int0.640.67

  • 2 - Following this link Use Sparse Columns i determinate the columns candidate to be sparse
    3 - Apply the script

    ALTER TABLE [Job] ALTER COLUMN [TotalOutersShort] int SPARSE NULL

    So at this point I was expecting fewer number of pages in the table using this query 

    SELECT SUM(used_page_count)
    FROM sys.dm_db_partition_stats 
    WHERE object_id = OBJECT_ID('job');


    but it actually went up. Furthermore , if i query the table before and after apply the sparse with "SET STATISTICS PROFILE ON" i have some strange results
    BEFORE
    EstimateIO = 7.020162; AvgRowSize = 603
    AFTER
    EstimateIO = 5.557199; AvgRowSize = 603

    Am i doing something wrong?

    Thanks in advance
    Henrry

  • Sparse columns are only good when you have a very high number of rows with a null value, those percentages are to low (that is if it is 0.64% and not 64%) to benefit from a sparse column.

    You will actually use more space storing data in sparse columns if the percentage of the NULL values is so low, as you have seen due to the sparse column overheads to manage the sparse bits etc

  • Hi Anthony

    Thanks for the replay. But lets see the case of TotalOutersShort column

    Null Values = 125175
    Total Rows = 185567

    So 125175 / 185567 = 0.67 OR 67%
    Which is over the recommendation for an int column 0.64 or 64%

    Thanks

  • I probably wouldn't consider sparse without having 95%+ being null and having a good number of the columns in the table mbeing mostly null. It's a feature with a specific use. It also means that the table can't be row/page compressed.

    The 64% for int isn't a recommendation. The page isn't saying that you should switch to sparse with that % null. Since sparse makes rows that have data large, altering the table likely caused a bunch of page splits, hence increasing the page count.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • henrrypires - Monday, September 18, 2017 6:41 AM

    Hi Anthony

    Thanks for the replay. But lets see the case of TotalOutersShort column

    Null Values = 125175
    Total Rows = 185567

    So 125175 / 185567 = 0.67 OR 67%
    Which is over the recommendation for an int column 0.64 or 64%

    Thanks

    Will teach me to actually read the post properly before jumping in.

    When you create the sparse column it actually copies the data into a new column then removes the old column as sparse is a storage layer option not the logical table option, as such that operation requires more space to perform.  Now if you got the page count before the cleanup happened as that's a background process we don't control, you may have got some dirty pages in that result set.

    Do you have any indexes on TotalOutersShort or any indexes that covers that column?  If so that index will grow also from 4 bytes per row to 8 bytes, you will be best looking at filtered indexes and excluding the NULL rows if its a single column index, gets tricky when its a covering index to filter out NULL rows as you filter out rows which may not need to be filtered.

  • anthony.green - Monday, September 18, 2017 7:51 AM

    henrrypires - Monday, September 18, 2017 6:41 AM

    Hi Anthony

    Thanks for the replay. But lets see the case of TotalOutersShort column

    Null Values = 125175
    Total Rows = 185567

    So 125175 / 185567 = 0.67 OR 67%
    Which is over the recommendation for an int column 0.64 or 64%

    Thanks

    Will teach me to actually read the post properly before jumping in.

    When you create the sparse column it actually copies the data into a new column then removes the old column as sparse is a storage layer option not the logical table option, as such that operation requires more space to perform.  Now if you got the page count before the cleanup happened as that's a background process we don't control, you may have got some dirty pages in that result set.

    Do you have any indexes on TotalOutersShort or any indexes that covers that column?  If so that index will grow also from 4 bytes per row to 8 bytes, you will be best looking at filtered indexes and excluding the NULL rows if its a single column index, gets tricky when its a covering index to filter out NULL rows as you filter out rows which may not need to be filtered.

    Hi Anthony

    Indeed the problem is the index. I remove all index on this table and with Sparse columns now contains 230 pages less

  • Viewing 6 posts - 1 through 5 (of 5 total)

    You must be logged in to reply to this topic. Login to reply