Does the query optimizer make use of transitive equality?

  • Does the query optimizer make use of transitive equality? By that I mean if A = B and B = C, then A = C. Does the query optimizer take that into account when formulating plans?

    Example:

    Think "Antiques Roadshow". A customer makes a visit. During the visit, one or more objects are assessed.

    Partners

    PartnerId primary key

    Customers

    PartnerId foreign key to Partners

    CustomerId

    composite primary key PartnerId, CustomerId

    Visits

    PartnerId

    VisitId

    CustomerPartnerId

    CustomerId

    composite primary key PartnerId, VisitId

    composite foreign key CustomerPartnerId, CustomerId to Customers

    Objects

    PartnerId

    ObjectId

    CustomerPartnerId

    CustomerId

    composite primary key PartnerId, ObjectId

    composite foreign key CustomerPartnerId, CustomerId to Customers

    Assessments

    PartnerId

    AssessmentId

    VisitPartnerId

    VisitId

    ObjectPartnerId

    ObjectId

    composite primary key PartnerId, AssessmentId

    composite foreign key VisitPartnerId, VisitId to Visits

    composite foreign key ObjectPartnerId, ObjectId to Objects

    Now, there is a query that joins all these tables:

    SELECT *

    FROM Partners p

    JOIN Customers c

    ON p.PartnerId = c.PartnerId

    JOIN Visits v

    ON c.PartnerId = v.CustomerPartnerId

    AND c.CustomerId = v.CustomerId

    JOIN Objects o

    ON c.PartnerId = o.CustomerPartnerId

    AND c.CustomerId = o.CustomerId

    JOIN Assessments a

    ON v.PartnerId = a.VisitPartnerId

    AND v.VisitId = a.VisitId

    AND o.PartnerId = a.ObjectPartnerId

    AND o.ObjectId = a.ObjectId

    This structure is forced on the design due to a restriction in nHibernate. In any row in

    Assessments, PartnerId = VisitPartnerId = ObjectPartnerId. This is enforced by the

    application, NOT with a database constraint (as I write this I'm realizing that there should be).

    One question is:

    Is it sufficient to have (for example)

    WHERE p.PartnerId = 1

    and the optimizer will apply this restriction on partner id across the query, or should the where

    clause be:

    WHERE p.PartnerId = 1

    AND c.PartnerId = 1

    AND v.PartnerId = 1

    AND o.PartnerId = 1

    AND a.PartnerId = 1

    or even more restrictive:

    WHERE p.PartnerId = 1

    AND c.PartnerId = 1

    AND v.PartnerId = 1

    AND o.PartnerId = 1

    AND a.PartnerId = 1

    AND v.CustomerPartnerId = 1

    AND o.CustomerPartnerId = 1

    AND a.VisitPartnerId = 1

    AND a.ObjectPartnerId = 1

    Another question:

    Would adding the check constraint I mentioned earlier help the optimizer?

  • Tom Bakerman (2/25/2013)


    Does the query optimizer make use of transitive equality? By that I mean if A = B and B = C, then A = C. Does the query optimizer take that into account when formulating plans?

    Example:

    Think "Antiques Roadshow". A customer makes a visit. During the visit, one or more objects are assessed.

    Partners

    PartnerId primary key

    Customers

    PartnerId foreign key to Partners

    CustomerId

    composite primary key PartnerId, CustomerId

    Visits

    PartnerId

    VisitId

    CustomerPartnerId

    CustomerId

    composite primary key PartnerId, VisitId

    composite foreign key CustomerPartnerId, CustomerId to Customers

    Objects

    PartnerId

    ObjectId

    CustomerPartnerId

    CustomerId

    composite primary key PartnerId, ObjectId

    composite foreign key CustomerPartnerId, CustomerId to Customers

    Assessments

    PartnerId

    AssessmentId

    VisitPartnerId

    VisitId

    ObjectPartnerId

    ObjectId

    composite primary key PartnerId, AssessmentId

    composite foreign key VisitPartnerId, VisitId to Visits

    composite foreign key ObjectPartnerId, ObjectId to Objects

    Now, there is a query that joins all these tables:

    SELECT *

    FROM Partners p

    JOIN Customers c

    ON p.PartnerId = c.PartnerId

    JOIN Visits v

    ON c.PartnerId = v.CustomerPartnerId

    AND c.CustomerId = v.CustomerId

    JOIN Objects o

    ON c.PartnerId = o.CustomerPartnerId

    AND c.CustomerId = o.CustomerId

    JOIN Assessments a

    ON v.PartnerId = a.VisitPartnerId

    AND v.VisitId = a.VisitId

    AND o.PartnerId = a.ObjectPartnerId

    AND o.ObjectId = a.ObjectId

    This structure is forced on the design due to a restriction in nHibernate. In any row in

    Assessments, PartnerId = VisitPartnerId = ObjectPartnerId. This is enforced by the

    application, NOT with a database constraint (as I write this I'm realizing that there should be).

    One question is:

    Is it sufficient to have (for example)

    WHERE p.PartnerId = 1

    and the optimizer will apply this restriction on partner id across the query, or should the where

    clause be:

    WHERE p.PartnerId = 1

    AND c.PartnerId = 1

    AND v.PartnerId = 1

    AND o.PartnerId = 1

    AND a.PartnerId = 1

    or even more restrictive:

    WHERE p.PartnerId = 1

    AND c.PartnerId = 1

    AND v.PartnerId = 1

    AND o.PartnerId = 1

    AND a.PartnerId = 1

    AND v.CustomerPartnerId = 1

    AND o.CustomerPartnerId = 1

    AND a.VisitPartnerId = 1

    AND a.ObjectPartnerId = 1

    Another question:

    Would adding the check constraint I mentioned earlier help the optimizer?

    I didn't dig all the way through the relational structure of your tables and the logic of your query, but I can tell you that in the part of your query I put in bold above, the optimizer will know that because p.PartnerID = c.PartnerID and c.PartnerID = v.CustomerPartnerID per the JOIN conditions, it can reduce the first part of the JOIN condition for the Visits table to "p.PartnerID = v.CustomerPartnerID" and will do so IF that makes for a better plan.

    If your join conditions mean that where p.PartnerID = 1, all the other "PartnerID" values shown in the part of your query I put in italics above will also = 1, you don't need to duplicate that logic in the WHERE clause as long as you are using OUTER JOINS.

    The optimizer would be "aware" of CHECK constraints among the three "PartnerID" columns in Assessments table and probably could use that information to avoid evaluating redundant join conditions on those columns. Generally, the optimizer is pretty good at sussing out all the logically equivalent query trees based on the query syntax and metadata like CHECK constraints.

    Jason Wolfkill

  • What you are talking about is known as 'Transitive Predicate Closure' or 'Transitive Closure'.

    SQL Server knows how to do this for certain predicate operations, but not for all of them. It definitely handles equality, but you need to do some more reserarch for other operations.

    In particular, you should check the current status for greater than, less than, between, as this may be dependant on the version (SQL 2005, SQL2008, etc) of SQL Server you are using. As far as I know, SQL cannot do transitive closure where a function is used within a predicate.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply