Optimising join with a CASE statement

  • I have data that I want at multiple granularities, 5,15,30 and 60 minutes. To reduce repetition, I have put them all in the same table, so that there is a column for 5,15,30 and 60 minutes, with a filtered index on each of the columns that removes the nulls. This means that each day will have 288 slots, but only 24 of the slots are filled in for 60 min data, and all of them are filled for 5 minute data.

    I have another column that specifies the interval granularity, and my first thought was to access my data through a join, where I can use a CASE statement, and depending on the data granularity necessary, it will look at a different column:

    INNER JOIN Data d ON

    AND d.settlement_key =

    CASE st.interval_granularity

    WHEN 5 THEN [5_min_settlement_key]

    WHEN 15 THEN [15_min_settlement_key]

    WHEN 60 THEN [60_min_settlement_key]

    ELSE NULL END

    Despite the presence of the indexes on the columns, then the process seems to be quite slow, I think probably due to the fact that any query plan isn't going to know beforehand which of the columns it is going to use for any given dataset, until it actually starts to run, so it may not be optimised.

    Can someone recommend a way how I could optimise this based on the given structure? Maybe there are hints to be added to the join, or maybe I can clear the query plan each time the SQL is run? My other option for dealing with the data of different granularities was to use one column and repeat the data multiple times, each at the different granularity, but this makes my data, row and table sizes much higher, as we are adding just a column for each additional granularity. Would this work any better in future versions of SQL server, maybe with column store indexes?

  • The CASE prevents any index usage on the settlement_key columns. No hints or columnstore is going to change that. If the join is the cause of it being slow (and we don't know that for sure)m then you'll need to rewrite the query so that the join is <column> = <column> to fix it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Is the key column really that long?

    Why not just use a separate row for each interval, 5, 15 and 60 minutes?:

    settlement_key, interval, value

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • While I definitely agree with Scott and Gail that the table design here does not sound optimal at all; if changing the tables is really not an option, what about running separate queries and unioning the results together for your final data set?

    Edit: misread the sample so removed my code that didn't actually apply.

  • Well I simplified it, we do have more granularities. The space saving on this table structure is that it takes around half the space, which is a large consideration, as smaller indexes = less index to search, and the data quantity is large. It also means that we have many smaller indexes that can be maintained, rather than one very large one, and data at the high granularities come out more quickly, as they are smaller filtered indexes. It'd all work quite well if it weren't for sub-optimal use of indexes, which is what I am trying to resolve.

    I'll look into maybe a union-like statement, I was already playing around with that option, I was just wondering if there were any hints on the joins or indexes that could be used, or any other more complex areas of T-SQL that I might have been overlooking.

  • Why not store only the 5 min intervals and then use GROUP by ROLLUP syntax to find the higher levels as needed.

    Jayanth Kurup[/url]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply