Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

ORMs and Indexes with Included Columns Expand / Collapse
Author
Message
Posted Wednesday, December 1, 2010 9:12 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 27, 2011 10:26 AM
Points: 18, Visits: 65
I'm using LINQ to SQL, but I think this question may apply to any ORM tool.

L2S entities are mapped from each database table one-to-one. When you retrieve an entity (i.e. Customer) you get the data from each column for the requested row and depending on your DataLoadOptions, you may also retrieve related entities (i.e. Addresses, Orders, etc.).

So let's say I have a Customer table like this:

TABLE Customer (
[Id] [int] IDENTITY(1,1) NOT NULL,
[StatusId] [int] NOT NULL,
[CustomerNumber] [nvarchar](50) NOT NULL,
[FirstName] [nvarchar](30) NOT NULL,
[LastName] [nvarchar](40) NOT NULL,
[EmailAddress] [nvarchar](100) NULL,
)

Now lets say I have a non-clustered index on EmailAddress with included columns: FirstName and LastName.

When L2S gets a Customer (or Customers) by EmailAddress it looks at this index, but since the entity returned includes the data from each column (not just those in the included columns) how does it retrieve the StatusId and CustomerNumber columns. Should I just put all of the columns in the included columns? Notice that the above is just an example and most of my entities are much more complex than this, so have all of the indexes include all columns seems like a poor decision.

Thanks for any help.

Edit: Maybe this would be better stated as pure SQL. I'll be honest when I say that I don't understand indexes as thoroughly as I'd like to. But after re-reading my post I think it could be stated simply as, "What's the difference between the following queries (with respect to performance and the above index)"

-- Index seems tuned for this query
SELECT FirstName, LastName
FROM Customer
WHERE EmailAddress = ?

-- L2S will actually do something like this
SELECT Id, StatusId, CustomerNumber, FirstName, LastName, EmailAddress
FROM Customer
WHERE EmailAddress = ?

Post #1028759
Posted Wednesday, December 1, 2010 11:03 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:04 PM
Points: 42,437, Visits: 35,491
mikesigsworth (12/1/2010)
Should I just put all of the columns in the included columns?


No, absolutely not. By doing that you will be duplicating the table. Think of the impact on database size, backup size, backup duration, reindex time, etc.

There might be some specialised cases where this is a good idea, but they're edge cases (and probably data warehouse type systems)

Edit: Maybe this would be better stated as pure SQL. I'll be honest when I say that I don't understand indexes as thoroughly as I'd like to. But after re-reading my post I think it could be stated simply as, "What's the difference between the following queries (with respect to performance and the above index)"

-- Index seems tuned for this query
SELECT FirstName, LastName
FROM Customer
WHERE EmailAddress = ?

-- L2S will actually do something like this
SELECT Id, StatusId, CustomerNumber, FirstName, LastName, EmailAddress
FROM Customer
WHERE EmailAddress = ?



For the first the index is covering, for the second it's not. Depending how many rows will be returned, SQL may or may not use the index. If email address is unique (or nearly so), the index will still be used.

As for indexes...

http://www.sqlservercentral.com/articles/Indexing/68439/ (3 parts)
http://www.sqlservercentral.com/articles/Indexing/68439/



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1028830
Posted Thursday, December 2, 2010 9:14 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 27, 2011 10:26 AM
Points: 18, Visits: 65
Thanks so much for the link. That was a great introduction to indexes, something I sorely needed!

I do have a couple questions however. The 3rd part on non-clustered indexes and covering says:

...the index must be either covering or return sufficiently small number of rows that the required lookups to the clustered index/heap (to retrieve the remainder of the columns) is not considered too expensive by the optimiser. If the required lookups are considered too expensive then the index will not be used... If a nonclustered index is to be used to evaluate a query it needs to be covering or sufficiently selective that the costs of the lookups aren’t deemed to be too high.


The problem with our datalayer is that LINQ to SQL queries typically look like this:
public Entity GetById(int id) {
...
Entity e =
(from x in context.SomeTable
where x.Id == id
select x).FirstOrDefault();

return e;
}

The main problem with this is the "select x" line is essentially converted to SELECT *. Our app doesn't use any sort of LINQ projections, for reasons I don't want to get into, other than it was a bad choice made too long ago. The above query would just use the clustered index though. I'm more focused on creating non-clustered indexes right now to fix performance issues. The problem is, even in "find" queries we don't use any form of projection. For example,

public List<Entity> FindByName(string name) {
...
List<Entity> listOfE =
(from x in context.SomeTable
where x.Name == name
select x).ToList();

return listOfE;
}

So from what I understand, if I want a covering index for this query it either has to include all of the columns in SomeTable to avoid a lookup, or I need to keep the # of included columns minimal so that the cost of the lookup isn't considered too expensive and the index isn't used at all?

I think what's really confusing me is the included columns. In my situation, should I bother with them at all? And on a query like the "find" one above, the lookups are done per row found?

Thanks again for all your help.
Post #1029349
Posted Thursday, December 2, 2010 9:30 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:04 PM
Points: 42,437, Visits: 35,491
So from what I understand, if I want a covering index for this query it either has to include all of the columns in SomeTable to avoid a lookup, or I need to keep the # of included columns minimal so that the cost of the lookup isn't considered too expensive and the index isn't used at all?


Including some columns is as good as including none. The number of columns is irrelevant. Fro SQL, looking up 1 column is like looking up 20. It's the number of rows that it must look up that's important

You're said nothing about the selectivity of those queries. Quoting the portion of the article that you quoted (emphasis mine)
...the index must be either covering or return sufficiently small number of rows that the required lookups to the clustered index/heap (to retrieve the remainder of the columns) is not considered too expensive by the optimiser.


So... How many rows are those returning? 1? 0.001% of the table? 50% of the table?

And on a query like the "find" one above, the lookups are done per row found?


Lookups are always done per row.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1029358
Posted Thursday, December 2, 2010 9:35 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 27, 2011 10:26 AM
Points: 18, Visits: 65
Well the find query that I am currently optimizing is paged so it returns at most 20 rows on a 2.5million row table. So I guess by using paging I can reduce the result set to the required size in order to use the index.

I didn't really get what "selectivity" meant. I do now.

You are a king among men. Thanks a ton!
Post #1029362
Posted Thursday, December 2, 2010 9:39 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:04 PM
Points: 42,437, Visits: 35,491
mikesigsworth (12/2/2010)
Well the find query that I am currently optimizing is paged so it returns at most 20 rows on a 2.5million row table. So I guess by using paging I can reduce the result set to the required size in order to use the index.


That's well within the range where the lookups will be done.

The other point is, have you tested and does SQL use the index? If you haven't tested, do so. You cannot put indexes onto tables based on what you think will happen. You have to test and make sure that what you expect to happen really will.

You are a king among men.


Err, I'll take that as figurative, not literal



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1029368
Posted Thursday, December 2, 2010 9:44 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, June 27, 2011 10:26 AM
Points: 18, Visits: 65
I agree. You should never assume that any optimization has actually optimized anything until you test it. Testing L2S is a pain in the *** because you basically need to use profiler to pull out the generated SQL and then run it manually to see the execution plan. Which I will do, it's just not fun.

A true "King Among Men" is defined as being absolutely awesome in everything done.

So no, I'm not about to swear fealty. But you rock!
Post #1029373
Posted Saturday, December 8, 2012 10:19 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, December 8, 2012 9:25 PM
Points: 8, Visits: 3
I would like to use ORM with Bonfire, but the module cannot find my DM class. Any help appreciated.

Daytona Beach
Post #1394342
Posted Saturday, December 8, 2012 10:48 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:31 AM
Points: 36,714, Visits: 31,164
LightningFingers (12/2/2010)
I agree. You should never assume that any optimization has actually optimized anything until you test it. Testing L2S is a pain in the *** because you basically need to use profiler to pull out the generated SQL and then run it manually to see the execution plan. Which I will do, it's just not fun.

A true "King Among Men" is defined as being absolutely awesome in everything done.

So no, I'm not about to swear fealty. But you rock!


Heh... despite her avatar, GilaMonster is Gail Shaw. She's not only a lady, but a very smart lady that has tested out on and earned the MCM in SQL Server.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1394379
Posted Saturday, December 8, 2012 10:52 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:31 AM
Points: 36,714, Visits: 31,164
LightningFingers (12/1/2010)
I'm using LINQ to SQL, but I think this question may apply to any ORM tool.


Then, you have a much more insideous problem to watch for. If you do nothing to prevent it, LINQ to SQL has been known to pass all character based parameters as NVARCHAR. If you play those against a table that has VARCHAR columns, then the entire column will be implicitly converted to NVARCHAR which also makes it impossible to do an index SEEK.

It's also tough for auditing. It's also been known not to pass certain parameters if the value it's passing during an update is the same as the current value especially during the same session. That means you can get a NULL in your INSERTED "Modified By" columns in an audit trigger.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1394380
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse