Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Filtered Indexes - Simple Partitioning without Enterprise Edition? Expand / Collapse
Author
Message
Posted Friday, February 22, 2013 3:31 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 9:21 AM
Points: 26, Visits: 361
ScottPletcher (2/21/2013)
Did you try just putting the Status column only in the index? SQL can still use that index to do a key lookup back to the main table. Hopefully the number of rows in Active status would be limited enough to allow SQL to do that.

Or did you try that and find that SQL would then not use the index but do a full scan instead?!


Hi there, I tried filtered indexes as you suggest and that was beneficial in it's own right. But I knew I could go further. Some of the table have very high selectivity (eg typically 7500 Active records out of 20 million rows) and some less so and I just knew that a 'full covering filtered index' including all columns would not be a great overhead and enable the query optimiser to get the right answer every time without thinking about it and that proved to be the case. It could be a very dangerous technique where selectivity is not so good and there is much higher data volume changing the status field. Sometimes you just know. The hardest thing was figuring out that a filtered index could create a virtual table by including all the other fields, even though it now seem obvious, as almost every example of such indexes only include a few fields but my index looks very strange with many fields!



Post #1422958
Posted Friday, February 22, 2013 7:20 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, September 15, 2014 7:54 AM
Points: 12, Visits: 59
It's a good idea, but why not use included fields, which will reduce the overhead further?
Post #1423051
Posted Friday, February 22, 2013 7:52 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Yesterday @ 3:41 AM
Points: 76, Visits: 345
In the long run, especially with examples of having 7,500 active out of 20M, you will want to move to partitioned views if you stick on standard edition.

Table A - Status constraint to have only a [RowStatus] of "A" (Active)
Table B - Status constraint to have only a [RowStatus] of "H" (Historic)
Table C - Status constraint to have only a [RowStatus] of "X" (Cancelled)

All default inserts go into Table A and are moved into B or C when the status changes. You then access the records via a unioned view:

CREATE VIEW dbo.vAllOrders
AS
SELECT COL1, cOL2, RowStatus
from Table A
UNION ALL
SELECT COL1, cOL2, RowStatus
from Table B
UNION ALL
SELECT COL1, cOL2, RowStatus
from Table C;

The optimizer is bright enough to know, if your constraints are in place and trusted, to only read data from the correct table. So in the example above, SELECT * FROM dbo.vAllOrders WHERE [RowStatus] = 'A' would only read data from Table A (7.5K rows) and would totally ignore Table's B and C (partition elimination).

There's no massive code change involve and this scales very well. Reading history (20M rows) will be slow but reading active records will be very fast. Your storage costs will be far lower and your indexes much more productive. Google partitioned views/partition elimination, as this would definitely be the way forward in the long run. Partition elimination does work in all versions, including standard, and is a great way of getting robust partitioning in situations like these where you cannot use enterprise table partitioning.


James
MCM [@TheSQLPimp]
Post #1423071
Posted Friday, February 22, 2013 7:53 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 12:50 PM
Points: 1,061, Visits: 2,580
shian_lineage2 (2/22/2013)
It's a good idea, but why not use included fields, which will reduce the overhead further?


I'm wondering - why do you say that included columns would reduce overhead? I assume you mean the overhead of maintaining the index when there are inserts, updates, or deletes. If so, included columns actually increase the overhead because the data for the included columns is stored at the leaf level of the nonclustered index and so must be managed whenever the data in the underlying table changes. Included columns are a good idea when the increased performance of "covered" queries that use the index outweighs the costs of maintaining the index.

At any rate, the author here described a filtered index that included ALL the columns of the table. He explained pretty well that the costs of maintaining such an index were small enough and easily justified by the performance gain of queries that could use that index.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1423073
Posted Friday, February 22, 2013 8:20 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 9:21 AM
Points: 26, Visits: 361
James A Skipwith (2/22/2013)
In the long run, especially with examples of having 7,500 active out of 20M, you will want to move to partitioned views if you stick on standard edition.

Table A - Status constraint to have only a [RowStatus] of "A" (Active)
Table B - Status constraint to have only a [RowStatus] of "H" (Historic)
Table C - Status constraint to have only a [RowStatus] of "X" (Cancelled)

All default inserts go into Table A and are moved into B or C when the status changes. You then access the records via a unioned view:

CREATE VIEW dbo.vAllOrders
AS
SELECT COL1, cOL2, RowStatus
from Table A
UNION ALL
SELECT COL1, cOL2, RowStatus
from Table B
UNION ALL
SELECT COL1, cOL2, RowStatus
from Table C;

The optimizer is bright enough to know, if your constraints are in place and trusted, to only read data from the correct table. So in the example above, SELECT * FROM dbo.vAllOrders WHERE [RowStatus] = 'A' would only read data from Table A (7.5K rows) and would totally ignore Table's B and C (partition elimination).

There's no massive code change involve and this scales very well. Reading history (20M rows) will be slow but reading active records will be very fast. Your storage costs will be far lower and your indexes much more productive. Google partitioned views/partition elimination, as this would definitely be the way forward in the long run. Partition elimination does work in all versions, including standard, and is a great way of getting robust partitioning in situations like these where you cannot use enterprise table partitioning.


Thanks, that's a good idea I will investigate. Things are slightly more complex than I painted as I use merge replication with over 100 tables (this is the largest in rowcount) over relatively low bandwidth ADSL to many sites so moving large amounts of data around is not trivial in that scenario. Once the system was set up and in steady state relatively low numbers of records would be moving round. That is one of the reasons I liked the solution suggested as it required no code changes and was replication friendly. Thanks again for the suggestion.



Post #1423093
Posted Friday, February 22, 2013 8:25 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Yesterday @ 3:41 AM
Points: 76, Visits: 345
Ah, that does complicate things. Do look into it though, as it would make for a good follow up article for you to write. And do keep writing!

James
MCM [@TheSQLPimp]
Post #1423096
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse