ORMs and Indexes with Included Columns

  • I'm using LINQ to SQL, but I think this question may apply to any ORM tool.

    L2S entities are mapped from each database table one-to-one. When you retrieve an entity (i.e. Customer) you get the data from each column for the requested row and depending on your DataLoadOptions, you may also retrieve related entities (i.e. Addresses, Orders, etc.).

    So let's say I have a Customer table like this:

    TABLE Customer (

    [Id] [int] IDENTITY(1,1) NOT NULL,

    [StatusId] [int] NOT NULL,

    [CustomerNumber] [nvarchar](50) NOT NULL,

    [FirstName] [nvarchar](30) NOT NULL,

    [LastName] [nvarchar](40) NOT NULL,

    [EmailAddress] [nvarchar](100) NULL,

    )

    Now lets say I have a non-clustered index on EmailAddress with included columns: FirstName and LastName.

    When L2S gets a Customer (or Customers) by EmailAddress it looks at this index, but since the entity returned includes the data from each column (not just those in the included columns) how does it retrieve the StatusId and CustomerNumber columns. Should I just put all of the columns in the included columns? Notice that the above is just an example and most of my entities are much more complex than this, so have all of the indexes include all columns seems like a poor decision.

    Thanks for any help.

    Edit: Maybe this would be better stated as pure SQL. I'll be honest when I say that I don't understand indexes as thoroughly as I'd like to. But after re-reading my post I think it could be stated simply as, "What's the difference between the following queries (with respect to performance and the above index)"

    -- Index seems tuned for this query

    SELECT FirstName, LastName

    FROM Customer

    WHERE EmailAddress = ?

    -- L2S will actually do something like this

    SELECT Id, StatusId, CustomerNumber, FirstName, LastName, EmailAddress

    FROM Customer

    WHERE EmailAddress = ?

  • mikesigsworth (12/1/2010)


    Should I just put all of the columns in the included columns?

    No, absolutely not. By doing that you will be duplicating the table. Think of the impact on database size, backup size, backup duration, reindex time, etc.

    There might be some specialised cases where this is a good idea, but they're edge cases (and probably data warehouse type systems)

    Edit: Maybe this would be better stated as pure SQL. I'll be honest when I say that I don't understand indexes as thoroughly as I'd like to. But after re-reading my post I think it could be stated simply as, "What's the difference between the following queries (with respect to performance and the above index)"

    -- Index seems tuned for this query

    SELECT FirstName, LastName

    FROM Customer

    WHERE EmailAddress = ?

    -- L2S will actually do something like this

    SELECT Id, StatusId, CustomerNumber, FirstName, LastName, EmailAddress

    FROM Customer

    WHERE EmailAddress = ?

    For the first the index is covering, for the second it's not. Depending how many rows will be returned, SQL may or may not use the index. If email address is unique (or nearly so), the index will still be used.

    As for indexes...

    http://www.sqlservercentral.com/articles/Indexing/68439/ (3 parts)

    http://www.sqlservercentral.com/articles/Indexing/68439/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks so much for the link. That was a great introduction to indexes, something I sorely needed!

    I do have a couple questions however. The 3rd part on non-clustered indexes and covering says:

    ...the index must be either covering or return sufficiently small number of rows that the required lookups to the clustered index/heap (to retrieve the remainder of the columns) is not considered too expensive by the optimiser. If the required lookups are considered too expensive then the index will not be used... If a nonclustered index is to be used to evaluate a query it needs to be covering or sufficiently selective that the costs of the lookups aren’t deemed to be too high.

    The problem with our datalayer is that LINQ to SQL queries typically look like this:

    public Entity GetById(int id) {

    ...

    Entity e =

    (from x in context.SomeTable

    where x.Id == id

    select x).FirstOrDefault();

    return e;

    }

    The main problem with this is the "select x" line is essentially converted to SELECT *. Our app doesn't use any sort of LINQ projections, for reasons I don't want to get into, other than it was a bad choice made too long ago. The above query would just use the clustered index though. I'm more focused on creating non-clustered indexes right now to fix performance issues. The problem is, even in "find" queries we don't use any form of projection. For example,

    public List<Entity> FindByName(string name) {

    ...

    List<Entity> listOfE =

    (from x in context.SomeTable

    where x.Name == name

    select x).ToList();

    return listOfE;

    }

    So from what I understand, if I want a covering index for this query it either has to include all of the columns in SomeTable to avoid a lookup, or I need to keep the # of included columns minimal so that the cost of the lookup isn't considered too expensive and the index isn't used at all?

    I think what's really confusing me is the included columns. In my situation, should I bother with them at all? And on a query like the "find" one above, the lookups are done per row found?

    Thanks again for all your help.

  • So from what I understand, if I want a covering index for this query it either has to include all of the columns in SomeTable to avoid a lookup, or I need to keep the # of included columns minimal so that the cost of the lookup isn't considered too expensive and the index isn't used at all?

    Including some columns is as good as including none. The number of columns is irrelevant. Fro SQL, looking up 1 column is like looking up 20. It's the number of rows that it must look up that's important

    You're said nothing about the selectivity of those queries. Quoting the portion of the article that you quoted (emphasis mine)

    ...the index must be either covering or return sufficiently small number of rows that the required lookups to the clustered index/heap (to retrieve the remainder of the columns) is not considered too expensive by the optimiser.

    So... How many rows are those returning? 1? 0.001% of the table? 50% of the table?

    And on a query like the "find" one above, the lookups are done per row found?

    Lookups are always done per row.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Well the find query that I am currently optimizing is paged so it returns at most 20 rows on a 2.5million row table. So I guess by using paging I can reduce the result set to the required size in order to use the index.

    I didn't really get what "selectivity" meant. I do now.

    You are a king among men. Thanks a ton!

  • mikesigsworth (12/2/2010)


    Well the find query that I am currently optimizing is paged so it returns at most 20 rows on a 2.5million row table. So I guess by using paging I can reduce the result set to the required size in order to use the index.

    That's well within the range where the lookups will be done.

    The other point is, have you tested and does SQL use the index? If you haven't tested, do so. You cannot put indexes onto tables based on what you think will happen. You have to test and make sure that what you expect to happen really will.

    You are a king among men.

    Err, I'll take that as figurative, not literal 😉

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I agree. You should never assume that any optimization has actually optimized anything until you test it. Testing L2S is a pain in the *** because you basically need to use profiler to pull out the generated SQL and then run it manually to see the execution plan. Which I will do, it's just not fun.

    A true "King Among Men" is defined as being absolutely awesome in everything done.

    So no, I'm not about to swear fealty. But you rock!

  • I would like to use ORM with Bonfire, but the module cannot find my DM class. Any help appreciated. :w00t:

  • LightningFingers (12/2/2010)


    I agree. You should never assume that any optimization has actually optimized anything until you test it. Testing L2S is a pain in the *** because you basically need to use profiler to pull out the generated SQL and then run it manually to see the execution plan. Which I will do, it's just not fun.

    A true "King Among Men" is defined as being absolutely awesome in everything done.

    So no, I'm not about to swear fealty. But you rock!

    Heh... despite her avatar, GilaMonster is Gail Shaw. She's not only a lady, but a very smart lady that has tested out on and earned the MCM in SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • LightningFingers (12/1/2010)


    I'm using LINQ to SQL, but I think this question may apply to any ORM tool.

    Then, you have a much more insideous problem to watch for. If you do nothing to prevent it, LINQ to SQL has been known to pass all character based parameters as NVARCHAR. If you play those against a table that has VARCHAR columns, then the entire column will be implicitly converted to NVARCHAR which also makes it impossible to do an index SEEK.

    It's also tough for auditing. It's also been known not to pass certain parameters if the value it's passing during an update is the same as the current value especially during the same session. That means you can get a NULL in your INSERTED "Modified By" columns in an audit trigger.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Please note: 2 year old thread (reactivated by some spam)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply