Filtered Indexes - Simple Partitioning without Enterprise Edition?

Question

Post reply

Filtered Indexes - Simple Partitioning without Enterprise Edition?

maneffa

Mr or Mrs. 500

Points: 512
More actions
February 21, 2013 at 12:01 am

#274457

Comments posted to this topic are about the item Filtered Indexes - Simple Partitioning without Enterprise Edition?
Forum Etiquette: How to post Reporting Services problems

[/url]

Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
[/url]

How to Post Performance Problems - by Gail Shaw
[/url]

amal7us SSC Rookie Points: 37 More actions · Answer 1

It will add the overhead of the index i.e. inserts will be slower than in partitioned tables

James A Skipwith SSCommitted Points: 1808 More actions · Answer 2

Nigel, certainly a good quick method of reducing IO in this situation with no code changes. Bear in mind that statistics are NOT automatically maintained on filtered indexes and you will have to add maintenance to do this yourself, otherwise your performance gains will eventually slip away.

James
MCM [@TheSQLPimp]

mister.magoo SSC-Forever Points: 47068 More actions · Answer 3

Thanks for the article, my only criticism being the lack of examples or links to further reading, which would have been nice.

But, I can Google, so no biggie 😛

MM

select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);

maneffa Mr or Mrs. 500 Points: 512 More actions · Answer 4

That is correct, and one of the limitations of this solution. In my scenario the maintenance of the index does not add a lot of overhead but I could see other examples where it could be significant.

Thanks

Nigel

rice.tx Mr or Mrs. 500 Points: 559 More actions · Answer 5

We have a very similar situation, and are looking forward to upgrading and making use of filtered indexes.

It's not clear to me... did you try using filtered indexes with just the "status" field, and still had problems with complex queries? Or were the problems you experienced in prior versions, where the "status" index was a regular (non-filtered) index.

Thanks.

davidturpin SSC Veteran Points: 216 More actions · Answer 6

In my opinion, filtered indexes are one of the unsung heroes of 2008+. They can significantly reduce IO when the column is part of an oft-used predicate.

As previously mentioned, relying on autostats to maintain the statistics will only result in threshold changes at the TABLE level, not the column level. For slowly changing, large, historical tables, statistics will only be an issue over time. More than likely, if you have a defrag plan on your indexes, the stats on the index will be updated by the index rebuild process.

Thanks for re-enforcing my belief patterns.

We use enterprise almost exclusively but the STANDARD edition point is worth the reminder. PARTITIONING is enterprise+.

Miles Neale SSChampion Points: 13147 More actions · Answer 7

Thanks for the article and talking about the problem solving as you walked through this. We have done like things and have built these to construct "covered queries" where the index contains that fields we want to use. Again it adds to overhead overall but the speed of getting the data where we read far more than we write has sped up the major functions of the system.

Keep up the good work!

Not all gray hairs are Dinosaurs!

ScottPletcher SSC Guru Points: 100946 More actions · Answer 8

Did you try just putting the Status column only in the index? SQL can still use that index to do a key lookup back to the main table. Hopefully the number of rows in Active status would be limited enough to allow SQL to do that.

Or did you try that and find that SQL would then not use the index but do a full scan instead?!

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

maneffa Mr or Mrs. 500 Points: 512 More actions · Answer 9

rice.tx (2/21/2013)
We have a very similar situation, and are looking forward to upgrading and making use of filtered indexes.
It's not clear to me... did you try using filtered indexes with just the "status" field, and still had problems with complex queries? Or were the problems you experienced in prior versions, where the "status" index was a regular (non-filtered) index.
Thanks.

Hi there, originally the status field was a normal index, and then I started experimenting with a filtered index and a few covering fields, and eventually came to the conclusion that adding all the columns would be highly beneficial in this scenario. This was due to the steady increase in disk IO over the years starting to create minor performance issues. Although I show a very simple example the real world queries are actually very complex but the common theme was very frequently that I was looking to take all the ACTIVE customer orders and look at the ACTIVE order routing records for those orders so if I had 'tables' of just the active orders the query optimiser would have a much simpler time without thinking about bookmark lookups etc. I hope this helps.

maneffa Mr or Mrs. 500 Points: 512 More actions · Answer 10

ScottPletcher (2/21/2013)
Did you try just putting the Status column only in the index? SQL can still use that index to do a key lookup back to the main table. Hopefully the number of rows in Active status would be limited enough to allow SQL to do that.
Or did you try that and find that SQL would then not use the index but do a full scan instead?!

Hi there, I tried filtered indexes as you suggest and that was beneficial in it's own right. But I knew I could go further. Some of the table have very high selectivity (eg typically 7500 Active records out of 20 million rows) and some less so and I just knew that a 'full covering filtered index' including all columns would not be a great overhead and enable the query optimiser to get the right answer every time without thinking about it and that proved to be the case. It could be a very dangerous technique where selectivity is not so good and there is much higher data volume changing the status field. Sometimes you just know. The hardest thing was figuring out that a filtered index could create a virtual table by including all the other fields, even though it now seem obvious, as almost every example of such indexes only include a few fields but my index looks very strange with many fields!

shian_lineage2 SSC Enthusiast Points: 102 More actions · Answer 11

It's a good idea, but why not use included fields, which will reduce the overhead further?

James A Skipwith SSCommitted Points: 1808 More actions · Answer 12

In the long run, especially with examples of having 7,500 active out of 20M, you will want to move to partitioned views if you stick on standard edition.

Table A - Status constraint to have only a [RowStatus] of "A" (Active)

Table B - Status constraint to have only a [RowStatus] of "H" (Historic)

Table C - Status constraint to have only a [RowStatus] of "X" (Cancelled)

All default inserts go into Table A and are moved into B or C when the status changes. You then access the records via a unioned view:

CREATE VIEW dbo.vAllOrders

AS

SELECT COL1, cOL2, RowStatus

from Table A

UNION ALL

SELECT COL1, cOL2, RowStatus

from Table B

UNION ALL

SELECT COL1, cOL2, RowStatus

from Table C;

The optimizer is bright enough to know, if your constraints are in place and trusted, to only read data from the correct table. So in the example above, SELECT * FROM dbo.vAllOrders WHERE [RowStatus] = 'A' would only read data from Table A (7.5K rows) and would totally ignore Table's B and C (partition elimination).

There's no massive code change involve and this scales very well. Reading history (20M rows) will be slow but reading active records will be very fast. Your storage costs will be far lower and your indexes much more productive. Google partitioned views/partition elimination, as this would definitely be the way forward in the long run. Partition elimination does work in all versions, including standard, and is a great way of getting robust partitioning in situations like these where you cannot use enterprise table partitioning.

James
MCM [@TheSQLPimp]

Jason Wolfkill SSCrazy Eights Points: 9832 More actions · Answer 13

shian_lineage2 (2/22/2013)
It's a good idea, but why not use included fields, which will reduce the overhead further?

I'm wondering - why do you say that included columns would reduce overhead? I assume you mean the overhead of maintaining the index when there are inserts, updates, or deletes. If so, included columns actually increase the overhead because the data for the included columns is stored at the leaf level of the nonclustered index and so must be managed whenever the data in the underlying table changes. Included columns are a good idea when the increased performance of "covered" queries that use the index outweighs the costs of maintaining the index.

At any rate, the author here described a filtered index that included ALL the columns of the table. He explained pretty well that the costs of maintaining such an index were small enough and easily justified by the performance gain of queries that could use that index.

Jason Wolfkill

maneffa Mr or Mrs. 500 Points: 512 More actions · Answer 14

James A Skipwith (2/22/2013)
In the long run, especially with examples of having 7,500 active out of 20M, you will want to move to partitioned views if you stick on standard edition.
Table A - Status constraint to have only a [RowStatus] of "A" (Active)
Table B - Status constraint to have only a [RowStatus] of "H" (Historic)
Table C - Status constraint to have only a [RowStatus] of "X" (Cancelled)
All default inserts go into Table A and are moved into B or C when the status changes. You then access the records via a unioned view:
CREATE VIEW dbo.vAllOrders
AS
SELECT COL1, cOL2, RowStatus
from Table A
UNION ALL
SELECT COL1, cOL2, RowStatus
from Table B
UNION ALL
SELECT COL1, cOL2, RowStatus
from Table C;
The optimizer is bright enough to know, if your constraints are in place and trusted, to only read data from the correct table. So in the example above, SELECT * FROM dbo.vAllOrders WHERE [RowStatus] = 'A' would only read data from Table A (7.5K rows) and would totally ignore Table's B and C (partition elimination).
There's no massive code change involve and this scales very well. Reading history (20M rows) will be slow but reading active records will be very fast. Your storage costs will be far lower and your indexes much more productive. Google partitioned views/partition elimination, as this would definitely be the way forward in the long run. Partition elimination does work in all versions, including standard, and is a great way of getting robust partitioning in situations like these where you cannot use enterprise table partitioning.

Thanks, that's a good idea I will investigate. Things are slightly more complex than I painted as I use merge replication with over 100 tables (this is the largest in rowcount) over relatively low bandwidth ADSL to many sites so moving large amounts of data around is not trivial in that scenario. Once the system was set up and in steady state relatively low numbers of records would be moving round. That is one of the reasons I liked the solution suggested as it required no code changes and was replication friendly. Thanks again for the suggestion.