Querying against horizonal partitioning using a variable.

  • I'm noticing a problem with optimising my queries vs a horizontal partition and was wondering if anyone else has been able to overcome this problem.

    The idea is to create a view that references a bunch of tables that minimally contain an ID - DisplayName pair, so that the view is in the format of 'data type', 'id', 'display name'. So the partitioning is on the data type.

    This works well, as when I query against the view when specifically referencing the 'data type', the optimiser only checks data on the specific table that is required.

    However when I replace the value with a variable the optimiser brings in all the tables into the query plan. Presumably this is because it doesn't know what the variable contains until it runs, even though it is technically a constant. Is there anyway around this? As it would seem to add a considerable amount of time to the execution.

    Here is an example of the problem:

    First setup some test tables and data

    create table dataA (id int identity not null primary key, display varchar(20) not null)

    insert dataA select 'A-1'

    insert dataA select 'A-2'

    insert dataA select 'A-3'

    create table dataB (id int identity not null primary key, display varchar(20) not null)

    insert dataB select 'B-1'

    insert dataB select 'B-2'

    insert dataB select 'B-3'

    create table dataC (id int identity not null primary key, display varchar(20) not null)

    insert dataC select 'C-1'

    insert dataC select 'C-2'

    insert dataC select 'C-3'

    create table dataD (id int identity not null primary key, display varchar(20) not null)

    insert dataD select 'D-1'

    insert dataD select 'D-2'

    insert dataD select 'D-3'

    go

    create view idDisplay

    as

    select1 as idType, id, display

    fromdataA

    union

    select2, id, display

    fromdataB

    union

    select3, id, display

    fromdataC

    union

    select4, id, display

    fromdataD

    go

    create table someData (idA int not null, idB int not null)

    insert someData select 2, 3

    insert someData select 1, 2

    insert someData select 3, 1

    go

    --drop view idDisplay

    --drop table dataA

    --drop table dataB

    --drop table dataC

    --drop table dataD

    --drop table someData

    So using this code, you can see that changing how we link to idType changes the source of the information. Looking at the query plan it only goes to the tables that it needs. ie in this example dataA and dataB, so its quick and efficient.

    select*

    fromsomeData d

    inner join idDisplay a on d.idA = a.id and a.idType = 1

    inner join idDisplay b on d.idB = b.id and b.idType = 2

    However the plan for this is quite different and 4x more expensive, as for each join it looks up the value in each of the tables in the view only to later select the actual line it needs.

    declare @aType int

    declare @bType int

    select@aType = 1, @bType = 2

    select*

    fromsomeData d

    inner join idDisplay a on d.idA = a.id and a.idType = @aType

    inner join idDisplay b on d.idB = b.id and b.idType = @bType

    I've tried 'optimize for...' and 'recompile' hints, which seem to improve performance, but do not exclude the unneeded tables. Nor does using sp_executeSQL, where I pass the variables in. Obviously I could using dynamic SQL to replace the variables with constants, but that's never a great solution.

    The ultimate goal would be to replace the variables with a link to another table. I figure I must be missing something, as it would seem that this sort of partitioning would increase the overhead of an execution cost instead of reducing it.

  • One point to keep in mind, in most circumstances partitioning is not a performance enhancer, but rather a data management enhancer. Performance is frequently degraded somewhat by partitioning. In some circumstances, you may see a performance enhancement, but don't plan on one from partitioning.

    From what you showed, you're seeing the difference between using a specific value to search statistics and using a generic value, the variable, to search statistics. To see if passing the variable can result in the same execution plan, try using WITH RECOMPILE on the query. You should then get a variable sniffed plan where the optimizer uses the actual values in the variables to search the statistics. It's the same as parameter sniffing. It ought to result in the exact same plan as passing in the actual values instead of using variables. For that matter, have you tested this as a stored procedure with parameters? You should end up with the same plan as passing specific values that way too.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (5/2/2012)


    One point to keep in mind, in most circumstances partitioning is not a performance enhancer, but rather a data management enhancer. Performance is frequently degraded somewhat by partitioning. In some circumstances, you may see a performance enhancement, but don't plan on one from partitioning.

    Firstly, thanks for your reply.

    Yes in this case I'm not looking to improve performance per se, as by doing this doesn't give me a performance enhancement, but it does make joining various bits of data together simpler and has the benefit of not taking a performance hit. (As if you use a constant the execution plan is the same as referencing the tables individually.)

    However this must be a problem that solved when people partition data for archival purposes. Say take an orders table that's partitioned on the year, for a query for a specific customer, its not going to go to each year to get all the customers order and once its fetched that data only then apply any date criteria.

    Grant Fritchey (5/2/2012)


    From what you showed, you're seeing the difference between using a specific value to search statistics and using a generic value, the variable, to search statistics. To see if passing the variable can result in the same execution plan, try using WITH RECOMPILE on the query. You should then get a variable sniffed plan where the optimizer uses the actual values in the variables to search the statistics. It's the same as parameter sniffing. It ought to result in the exact same plan as passing in the actual values instead of using variables. For that matter, have you tested this as a stored procedure with parameters? You should end up with the same plan as passing specific values that way too.

    Unfortunately option(recompile) doesn't appear to force the planner into checking the variables.

    For the record, the two execution plans that I see are as follows.

    One that uses variables. (Note no reference to 'dataC' or 'dataD', just how it would be if we queried the tables directly without the view.)

    select * from someData d inner join idDisplay a on d.idA = a.id and a.idType = 1 inner join idDisplay b on d.idB = b.id and b.idType = 2

    |--Compute Scalar(DEFINE:([Union1019]=(1), [Union1038]=(2)))

    |--Compute Scalar(DEFINE:([DataBase].[dbo].[dataB].[id]=[DataBase].[dbo].[dataB].[id], [DataBase].[dbo].[dataB].[display]=[DataBase].[dbo].[dataB].[display]))

    |--Nested Loops(Inner Join, OUTER REFERENCES:([d].[idB]))

    |--Compute Scalar(DEFINE:([DataBase].[dbo].[dataA].[id]=[DataBase].[dbo].[dataA].[id], [DataBase].[dbo].[dataA].[display]=[DataBase].[dbo].[dataA].[display]))

    | |--Nested Loops(Inner Join, OUTER REFERENCES:([d].[idA]))

    | |--Table Scan(OBJECT:([DataBase].[dbo].[someData] AS [d]))

    | |--Filter(WHERE:(STARTUP EXPR((1)=(1))))

    | |--Clustered Index Seek(OBJECT:([DataBase].[dbo].[dataA].[PK__dataA__60D24498]), SEEK:([DataBase].[dbo].[dataA].[id]=[DataBase].[dbo].[someData].[idA] as [d].[idA]) ORDERED FORWARD)

    |--Filter(WHERE:(STARTUP EXPR((2)=(2))))

    |--Clustered Index Seek(OBJECT:([DataBase].[dbo].[dataB].[PK__dataB__62BA8D0A]), SEEK:([DataBase].[dbo].[dataB].[id]=[DataBase].[dbo].[someData].[idB] as [d].[idB]) ORDERED FORWARD)

    Now introduce a variable, note the extra references to the tables.

    select * from someData d inner join idDisplay a on d.idA = a.id and a.idType = @aType inner join idDisplay b on d.idB = b.id and b.idType = @bType option (recompile)

    |--Hash Match(Inner Join, HASH:([d].[idB])=([Union1039]))

    |--Hash Match(Inner Join, HASH:([d].[idA])=([Union1020]))

    | |--Table Scan(OBJECT:([DataBase].[dbo].[someData] AS [d]))

    | |--Merge Join(Union)

    | |--Merge Join(Union)

    | | |--Merge Join(Union)

    | | | |--Compute Scalar(DEFINE:([Expr1006]=(1)))

    | | | | |--Filter(WHERE:(STARTUP EXPR((1)=[@aType])))

    | | | | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataA].[PK__dataA__60D24498]), ORDERED FORWARD)

    | | | |--Compute Scalar(DEFINE:([Expr1010]=(2)))

    | | | |--Filter(WHERE:(STARTUP EXPR((2)=[@aType])))

    | | | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataB].[PK__dataB__62BA8D0A]), ORDERED FORWARD)

    | | |--Compute Scalar(DEFINE:([Expr1014]=(3)))

    | | |--Filter(WHERE:(STARTUP EXPR((3)=[@aType])))

    | | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataC].[PK__dataC__64A2D57C]), ORDERED FORWARD)

    | |--Compute Scalar(DEFINE:([Expr1018]=(4)))

    | |--Filter(WHERE:(STARTUP EXPR((4)=[@aType])))

    | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataD].[PK__dataD__668B1DEE]), ORDERED FORWARD)

    |--Merge Join(Union)

    |--Merge Join(Union)

    | |--Merge Join(Union)

    | | |--Compute Scalar(DEFINE:([Expr1025]=(1)))

    | | | |--Filter(WHERE:(STARTUP EXPR((1)=[@bType])))

    | | | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataA].[PK__dataA__60D24498]), ORDERED FORWARD)

    | | |--Compute Scalar(DEFINE:([Expr1029]=(2)))

    | | |--Filter(WHERE:(STARTUP EXPR((2)=[@bType])))

    | | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataB].[PK__dataB__62BA8D0A]), ORDERED FORWARD)

    | |--Compute Scalar(DEFINE:([Expr1033]=(3)))

    | |--Filter(WHERE:(STARTUP EXPR((3)=[@bType])))

    | |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataC].[PK__dataC__64A2D57C]), ORDERED FORWARD)

    |--Compute Scalar(DEFINE:([Expr1037]=(4)))

    |--Filter(WHERE:(STARTUP EXPR((4)=[@bType])))

    |--Clustered Index Scan(OBJECT:([DataBase].[dbo].[dataD].[PK__dataD__668B1DEE]), ORDERED FORWARD)

    (Hmm, cant seem to disable smilies... Sigh.)

  • It's hard to tell exactly what's happening with incomplete information. Can you post the .sqlplan execution plans for both queries?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Certainly can.

    It would appear the option (recompile) should work, admittedly I'm trying outside of a stored procedure, but that shouldn't make a difference, should it?

  • Wow that's messed up. The estimated rows are different even though the values are the same. It looks like the variables, despite the recompile and the sniffed values, are still getting a sampled run at the statistics instead of a specific run. Out of curiosity, since these are estimated plans, what are the actual values on the rows? One thinks it's getting 1.5 rows, the other thinks it's getting 5.9. Which one is closer to the truth?

    Have you tried this with a parameter inside of a procedure instead of just dynamic TSQL? I'd be interested to see if that changes things.

    Another question, how much data is currently in the tables? Maybe there's not enough there to generate a meaningful set of statistics? Also, are the stats up to date?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (5/2/2012)


    Wow that's messed up. The estimated rows are different even though the values are the same. It looks like the variables, despite the recompile and the sniffed values, are still getting a sampled run at the statistics instead of a specific run. Out of curiosity, since these are estimated plans, what are the actual values on the rows? One thinks it's getting 1.5 rows, the other thinks it's getting 5.9. Which one is closer to the truth?

    All of this is based on the tables defined in the original post, so there isn't much in the tables themselves. That said, they are just examples. I have the same issue in 'real world' tables where some of the 'idtypes' have thousands of rows.

    The plans I saved are with the 'include actual execution plan' on, so they shouldn't be estimates.

    Grant Fritchey (5/2/2012)


    Have you tried this with a parameter inside of a procedure instead of just dynamic TSQL? I'd be interested to see if that changes things.

    This yields the same plan as the straight SQL above.

    create proc TestQueryWithVariables @aType int, @bType int

    as

    select*

    fromsomeData d

    inner join idDisplay a on d.idA = a.id and a.idType = @aType

    inner join idDisplay b on d.idB = b.id and b.idType = @bType

    option (recompile)

    go

    exec TestQueryWithVariables 1, 2

  • Ben-425648 (5/2/2012)


    Grant Fritchey (5/2/2012)


    Wow that's messed up. The estimated rows are different even though the values are the same. It looks like the variables, despite the recompile and the sniffed values, are still getting a sampled run at the statistics instead of a specific run. Out of curiosity, since these are estimated plans, what are the actual values on the rows? One thinks it's getting 1.5 rows, the other thinks it's getting 5.9. Which one is closer to the truth?

    All of this is based on the tables defined in the original post, so there isn't much in the tables themselves. That said, they are just examples. I have the same issue in 'real world' tables where some of the 'idtypes' have thousands of rows.

    The plans I saved are with the 'include actual execution plan' on, so they shouldn't be estimates.

    Grant Fritchey (5/2/2012)


    Have you tried this with a parameter inside of a procedure instead of just dynamic TSQL? I'd be interested to see if that changes things.

    This yields the same plan as the straight SQL above.

    create proc TestQueryWithVariables @aType int, @bType int

    as

    select*

    fromsomeData d

    inner join idDisplay a on d.idA = a.id and a.idType = @aType

    inner join idDisplay b on d.idB = b.id and b.idType = @bType

    option (recompile)

    go

    exec TestQueryWithVariables 1, 2

    So the stored procedure is working correctly?

    It's possible that if you only have a few rows that there just aren't enough stats to get a consistent plan. However, I understand you'd be getting the same behavior on lots of rows. Can you test the recompile on the larger system?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The simplification you are expecting with OPTION (RECOMPILE) only works with SQL Server 2008 and later. This is known as the parameter embedding optimization - see http://blogs.msdn.com/b/grahamk/archive/2009/11/18/changed-behaviour-of-option-recompile-syntax-in-sql-server-2008-sp1-cumulative-update-5.aspx

    Without this, the best the optimizer can do is introduce filters with start-up expressions.

    If you are able to use UNION ALL instead of UNION in the view, you should get a similar simplification, though only in the actual execution plan, not the estimated.

  • SQL Kiwi (5/2/2012)


    The simplification you are expecting with OPTION (RECOMPILE) only works with SQL Server 2008 and later. This is known as the parameter embedding optimization - see http://blogs.msdn.com/b/grahamk/archive/2009/11/18/changed-behaviour-of-option-recompile-syntax-in-sql-server-2008-sp1-cumulative-update-5.aspx

    Without this, the best the optimizer can do is introduce filters with start-up expressions.

    If you are able to use UNION ALL instead of UNION in the view, you should get a similar simplification, though only in the actual execution plan, not the estimated.

    Thank you, that may explain a few things!

    Annoyingly in the 'live' system I have UNION ALLs, I just missed that when setting up the example. Unfortunately replacing the view in the example to use UNION ALLs has no effect on the outcome.

  • Ben-425648 (5/2/2012)


    Annoyingly in the 'live' system I have UNION ALLs, I just missed that when setting up the example. Unfortunately replacing the view in the example to use UNION ALLs has no effect on the outcome.

    Yes, sorry that was my error - I rushed the test and was connected to 2008 when trying that. You're out of luck on 2005, I'm afraid, even on my build 5266 (SP4 CU3) rather than your 3042 (SP2).

  • I tried several approaches with this, many of which gave slightly improved results, but not by much. I tried "union all", persisted computed columns with and without check constraints on the table, forcing join types, etc.

    Only one option that I tested yielded the results I was looking for, and that was adding a persisted computed that is the leading key of the primary key.

    create table dataA (id int identity not null,

    display varchar(20) not null,

    idType As 1 Persisted,

    Primary Key (idType, id))

    insert dataA select 'A-1'

    insert dataA select 'A-2'

    insert dataA select 'A-3'

    create table dataB (id int identity not null,

    display varchar(20) not null,

    idType As 2 Persisted,

    Primary Key (idType, id))

    insert dataB select 'B-1'

    insert dataB select 'B-2'

    insert dataB select 'B-3'

    create table dataC (id int identity not null,

    display varchar(20) not null,

    idType As 3 Persisted,

    Primary Key (idType, id))

    insert dataC select 'C-1'

    insert dataC select 'C-2'

    insert dataC select 'C-3'

    create table dataD (id int identity not null,

    display varchar(20) not null,

    idType As 1 Persisted,

    Primary Key (idType, id))

    insert dataD select 'D-1'

    insert dataD select 'D-2'

    insert dataD select 'D-3'

    go

    Create view idDisplay

    as

    select1 as idType, id, display

    fromdataA

    union all

    select2, id, display

    fromdataB

    union all

    select3, id, display

    fromdataC

    union all

    select4, id, display

    fromdataD

    go

    This method generated a 49/51 cost percentage split for joining to the view once vs twice. Obviously, the percentages get more out of line as you add join the table more times, but even joining to it 4 times yields better results than the original join to it twice.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • I tried numerous permutations, including a CROSS APPLY, all to no avail. I also tried just a single hit on the view with an IN clause, thinking that if that gave the optimal plan we could massage the data out to the way you needed it - nope, same funky concat plan.

    So my recommendation is to use dynamic sql to craft your select statement. That will get you the actual values and the plan you seek (pun intended). 😛

    Here is a slapped-together demo. BE SURE TO GUARD AGAINST SQL INJECTION!!

    DECLARE @a int = 1, @b-2 int = 1, @sql varchar(500)

    SELECT @sql = 'selectd.*, a.*, b.*

    fromsomeData d

    inner join idDisplay a on d.idA = a.id and a.idType = ' + CAST(@a AS varchar(10)) + '

    inner join idDisplay b on d.idB = b.id and b.idType = ' + CAST(@b AS varchar(10))

    PRINT @sql

    EXEC (@sql)

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Aye, dynamic SQL it is...

    Thanks very much to all who took time to look at this problem, it's much appreciated!

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply