Can I tune this?

  • I have a data warehouse with millions of rows in my main table (Coverage) and my versioned table (CoverageVersions). The main table has "permanent" values which are always constant no matter what happens. The versioned table adds a new record every time a mutable value gets changed.

    The newest record in the versioned table is always the correct one. So to find the mutable values, I do a subquery to locate the record with the highest (most recent) EffectiveDate, then join the results back to the main table to get the non-mutable values.

    Here's my code:

    SELECT fcv.CoverageKey, fcv.SourceKey, fcv.Effective AS EffectiveDt, fcv.CovRefundKey, fcv.TypeKey, fcv.StatusKey

    FROM (SELECT CoverageKey, max(Effective) AS EffectiveDt

    FROM dbo.coverageversions

    GROUP BY coveragekey) fcvsq

    INNER JOIN dbo.coverageversions fcv

    ON fcvsq.coveragekey = fcv.CoverageKey AND fcvsq.EffectiveDt = fcv.Effective

    INNER JOIN dbo.coverage fc

    ON fcv.coveragekey = fc.coveragekey

    Now, the subquery always does an Index Seek, which is okay. But the JOIN always does an Index Scan. I'm trying to figure out if that's a good thing, a bad thing, or an indifferent thing. I don't have an exact running time because this query is part of a bigger proc (and SSMS keeps running out of memory when I run this), but I'm pretty sure most of the load occurs on this section.

    Maybe my query is solid and I can't enhance it. But I thought I'd throw it out to y'all and see if anyone has any suggestions.

    Here's my DDL. All dates are integer values that map back to the key (TimeKey) of a Time table, which sorts dates into FullDate, CalendarMonth, CalendarQuarter, etc.

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Coverage](

    [CoverageKey] [int] IDENTITY(1,1) NOT NULL,

    [SourceKey] [int] NOT NULL,

    [CertInfoKey] [int] NOT NULL,

    [PD2Key] [tinyint] NULL,

    [CovCD] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [TimeKey] [int] NOT NULL,

    [Deleted] [int] NULL,

    CONSTRAINT [PK_Coverage_CoverageKey] PRIMARY KEY CLUSTERED

    (

    [CoverageKey] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE TABLE [dbo].[CoverageVersions](

    [CoverageKey] [int] NOT NULL,

    [Effective] [int] NOT NULL,

    [SourceKey] [int] NOT NULL,

    [EntryDate] [int] NULL,

    [ExpirationDate] [int] NULL,

    [CustomerCost] [smallmoney] NULL,

    [ExportDate] [int] NULL,

    [CovRefundKey] [int] NULL,

    [BatchKey] [int] NULL,

    [StatusKey] [int] NULL,

    [TypeKey] [int] NULL,

    [ModifiedOn] [int] NULL,

    [RefundAcctYrMo] [int] NULL,

    CONSTRAINT [PK_CoverageVersions] PRIMARY KEY CLUSTERED

    (

    [CoverageKey] ASC,

    [Effective] ASC,

    [SourceKey] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[CoverageVersions] WITH CHECK ADD CONSTRAINT [FK_CoverageVersions_CoverageKey] FOREIGN KEY([CoverageKey])

    REFERENCES [dbo].[Coverage] ([CoverageKey])

    ON DELETE CASCADE

    GO

    ALTER TABLE [dbo].[CoverageVersions] CHECK CONSTRAINT [FK_CoverageVersions_CoverageKey]

    GO

    INSERT INTO dbo.Coverage (CoverageKey, SourceKey, CertInfoKey, PD2Key, CovCd,

    TimeKey, Deleted)

    SELECT 2, 1, 2220882, 0, 'GAP1', 40016, 0 UNION ALL

    SELECT 3, 1, 2782106, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 4, 1, 2334965, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 5, 1, 3171687, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 6, 1, 7919390, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 7, 1, 3646045, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 8, 1, 3667294, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 9, 1, 7905175, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 10, 1, 8534004, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 11, 1, 8158285, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 12, 1, 6576787, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 13, 1, 1413614, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 14, 1, 2252017, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 15, 1, 7426578, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 16, 1, 4170009, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 17, 1, 2101578, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 18, 1, 1881784, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 19, 1, 3184168, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 20, 1, 87996, 0, 'NONE', 40016, 0 UNION ALL

    SELECT 21, 1, 5947111, 0, 'NONE', 40016, 0;

    INSERT INTO dbo.CoverageVersions (CoverageKey, Effective, SourceKey, EntryDate, ExpirationDate,

    CustomerCost, ExportDate, CovRefundKey, BatchKey, StatusKey, TypeKey, ModifiedOn)

    SELECT 2, 40016, 1, 38257, 40259, 0.00, 38682, 0, 225330, 1, 0 UNION ALL

    SELECT 2, 40260, 1, 38257, 40259, 0.00, 38682, 0, 225330, 8, 0 UNION ALL

    SELECT 3, 40016, 1, 36493, 39049, 1.00, 0, 0, 5467, 8, 0 UNION ALL

    SELECT 4, 40016, 1, 36733, 39291, 1.00, 0, 0, 11263, 8, 0 UNION ALL

    SELECT 5, 40016, 1, 37010, 39566, 1.00, 0, 0, 3829, 8, 0 UNION ALL

    SELECT 6, 40016, 1, 36347, 38896, 1.00, 0, 0, 2021, 8, 0 UNION ALL

    SELECT 7, 40016, 1, 36502, 23922, 40.00, 0, 0, 461, 8, 0 UNION ALL

    SELECT 8, 40016, 1, 36365, 38927, 1.00, 0, 0, 2080, 8, 0 UNION ALL

    SELECT 9, 40016, 1, 36499, 23375, 40.00, 0, 0, 464, 8, 0 UNION ALL

    SELECT 10, 40016, 1, 37003, 24471, 1.00, 0, 0, 3796, 8, 0 UNION ALL

    SELECT 11, 40016, 1, 36611, 39170, 1.00, 0, 0, 34179, 8, 0 UNION ALL

    SELECT 12, 40016, 1, 36319, 38866, 1.00, 0, 0, 34695, 8, 0 UNION ALL

    SELECT 13, 40016, 1, 36667, 39230, 1.00, 0, 0, 34290, 8, 0 UNION ALL

    SELECT 14, 40016, 1, 36607, 39170, 1.00, 0, 0, 14997, 8, 0 UNION ALL

    SELECT 15, 40016, 1, 36607, 39170, 1.00, 0, 0, 34171, 8, 0 UNION ALL

    SELECT 16, 40016, 1, 36520, 39080, 1.00, 0, 0, 6365, 8, 0 UNION ALL

    SELECT 17, 40016, 1, 36578, 39139, 1.00, 0, 0, 4613, 8, 0 UNION ALL

    SELECT 18, 40016, 1, 36578, 39139, 1.00, 0, 0, 730, 8, 0 UNION ALL

    SELECT 19, 40016, 1, 36885, 39445, 1.00, 0, 0, 612, 8, 0 UNION ALL

    SELECT 20, 40016, 1, 36926, 39476, 1.00, 0, 0, 2746, 8, 0 UNION ALL

    SELECT 21, 40016, 1, 36926, 39476, 1.00, 0, 0, 4091, 8, 0;

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • You get a table scan because your subquery has no limiting WHERE condition. In order to find max(Effective) per CoverageKey you'd need to scan the entire table (or clustered index).

    One option would be adding a tinyint OutOfDate column with a default value (lets say Zero which is updated regulary (daily/weekly depending on the load and change frequency) and set the value to 1 if the data is not the most recent. Therewith you'd limit the rows to be searched. Based on that I would add a unique nonclustered index on OutOfDate DESC,CoverageKey,Effective. This index would be narrow and sorted in a way, that all new rows will be at the end. It wouldn't really matter if there are more than one row inserted for a CoverageKey between two OutOfDate Update cycles, since the majority of rows will be excluded (assuming there are more outdate than current rows).

    Finally, include WHERE OutOfDate =0 in your subquery.

    When applied to your sample data I got an index seek for the subquery and clustered index seek for the two main table.

    If you'd be on 2K8 instead of 2K5, I'd recommend making the index above a filtered one on OutOfDate =0...

    Edit: as a side note: Your set of insert statements for the CoverageVersions table fails since there are 12 column names listed but only 11 values per row provided... So I removed the last column name from the list.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • A similar option would be adding another table CoverageVersionsHistory with the current CoverageVersions structure and move all outdated versions into this history table on a scheduled base. So, instead of the OutOfDate column with two possible values you'd have two separate tables. This wouldn't require the additional index and the clustered index scan on that table wouldn't take that long. It might still be benfitial to add a NC index on CoverageKey,Effective though.



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (12/10/2010)


    You get a table scan because your subquery has no limiting WHERE condition. In order to find max(Effective) per CoverageKey you'd need to scan the entire table (or clustered index).

    I'm a little confused by this response. The Subquery has an Index Seek on it. Why would that (and the lack of a WHERE clause on it) cause a Clustered Index Scan on the table it is being joined to?

    EDIT: Wait. I get why when joining the subquery back to the original table might cause the scan. But there's also a Clustered Index Scan on the main Coverage table. That's the part I really don't get.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • LutzM (12/10/2010)


    A similar option would be adding another table CoverageVersionsHistory with the current CoverageVersions structure and move all outdated versions into this history table on a scheduled base.

    Yeah, but unfortunately I can't do that without causing a whole heap of pain. Not even sure I'll be able to add a column to the one table without causing problems (there are SSIS dependencies on this that could cause any schema alterations to be a major project).

    Thanks for catching the data issue. I'm editing that post right after this one.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Seems like we're both confused now... :ermm:

    Attached please find the snapshots of the execution plans I get when running the queries on your sample data (I didn't include files as sqlplan files since it should be easy to reproduce by running the sample data you provided).

    I think the different plans are caused by different row numbers and data distribution.

    Could you please post the actual execution plan as sqlplan based on your large table join? I'd expect it to be different...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Brandie Tarvin (12/10/2010)


    LutzM (12/10/2010)


    A similar option would be adding another table CoverageVersionsHistory with the current CoverageVersions structure and move all outdated versions into this history table on a scheduled base.

    Yeah, but unfortunately I can't do that without causing a whole heap of pain. Not even sure I'll be able to add a column to the one table without causing problems (there are SSIS dependencies on this that could cause any schema alterations to be a major project).

    Thanks for catching the data issue. I'm editing that post right after this one.

    Errmmm.... a rather dirty option:

    Add an after insert trigger to the CoverageVersionsHistory table and copy the inserts into a "pseudo-staging table". Use this table in your query instead of your current CoverageVersionsHistory and delete the outdated rows on a scheduled base. Therewith you wouldn't have to mess with your original table design and get a nice and dense table...



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (12/10/2010)


    Brandie Tarvin (12/10/2010)


    LutzM (12/10/2010)


    A similar option would be adding another table CoverageVersionsHistory with the current CoverageVersions structure and move all outdated versions into this history table on a scheduled base.

    Yeah, but unfortunately I can't do that without causing a whole heap of pain. Not even sure I'll be able to add a column to the one table without causing problems (there are SSIS dependencies on this that could cause any schema alterations to be a major project).

    Thanks for catching the data issue. I'm editing that post right after this one.

    Errmmm.... a rather dirty option:

    Add an after insert trigger to the CoverageVersionsHistory table and copy the inserts into a "pseudo-staging table". Use this table in your query instead of your current CoverageVersionsHistory and delete the outdated rows on a scheduled base. Therewith you wouldn't have to mess with your original table design and get a nice and dense table...

    Or off the same trigger remove the outdated ones... I think I see a few things Brandie but I've got to go work at work, sadly. Hopefully can swing in and really dig at it later. I'd mention them but I'm not entirely sure I'm not barking up the wrong tree. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (12/10/2010)


    ... I'd mention them but I'm not entirely sure I'm not barking up the wrong tree. 🙂

    Unlikely. The tree is already occupied and I can't spot you. 😀 😛



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (12/10/2010)


    Craig Farrell (12/10/2010)


    ... I'd mention them but I'm not entirely sure I'm not barking up the wrong tree. 🙂

    Unlikely. The tree is already occupied and I can't spot you. 😀 😛

    *snicker*

    Okay, it must be too early in the morning because I'm getting different execution plan results this morning that I know I got last week. For one, I don't have any branches in my plan this morning. So I'm going take a little time to go over this again to see what changed between my T-SQL today and my T-SQL on Friday.

    On the other hand, Lutz, you did give me a good idea for a NC Index on SourceKey that doesn't fix the execution plan, but speeds up the query considerably. So thanks for that.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie,

    I'm coming into this rather late (traveling the last couple of days). You could try something like this with a CROSS APPLY:

    SELECT

    FCV.CoverageKey, FCV.SourceKey, FCV.EffectiveDt, FCV.CovRefundKey

    , FCV.TypeKey, FCV.StatusKey

    FROM dbo.Coverage C

    CROSS APPLY

    (SELECT TOP 1 CV.CoverageKey, CV.SourceKey, CV.Effective AS EffectiveDt

    , CV.CovRefundKey, CV.TypeKey, CV.StatusKey

    FROM dbo.CoverageVersions CV

    WHERE CV.CoverageKey = C.CoverageKey

    ORDER BY CV.Effective DESC

    ) AS FCV

    I ended up trading 1 clustered index scan in the CoverageVersions table for a clustered index scan in the Coverage table. The Coverage table becomes the anchor in the query. If there is some way to limit the number of rows for the Coverage table by some sort of index it may end up performing better.

    Todd Fifield

  • Brandie Tarvin (12/13/2010)


    Okay, it must be too early in the morning because I'm getting different execution plan results this morning that I know I got last week. For one, I don't have any branches in my plan this morning.

    If by 'no branches' you mean no joins, the answer is because the optimizer is darned clever, and turns your whole query into this:

    This is a very similar plan to the first one Lutz posted. His had an extra seek because he had forgotten to add the foreign key between the two tables, or omitted to run the CHECK CONSTRAINT statement straight after the definition.

    The optimizer can see that you aren't returning any columns from the Coverage table, and it also knows that the join condition is on the primary key of Coverage, so it all boils down to a check that a matching row in Coverage exists. The foreign key (when checked and enabled) provides that guarantee, so there's no need to do a join at all.

    The second clever optimization is transforming the whole of the remaining query into a simple scan of the CoverageVersions table, followed by a Segment, and a Top. This is known as a Segment Top construction, and you can find full details of it here:

    http://sqlblog.com/blogs/paul_white/archive/2010/07/28/the-segment-top-query-optimisation.aspx

    Paul

  • The optimal index for the query given is:

    CREATE INDEX [IX dbo.CoverageVersions CoverageKey,Effective (SourceKey,CovRefundKey,TypeKey,StatusKey)]

    ON dbo.CoverageVersions (CoverageKey DESC, Effective DESC)

    INCLUDE (SourceKey, CovRefundKey, TypeKey, StatusKey)

    WITH (FILLFACTOR = 100, SORT_IN_TEMPDB = ON, MAXDOP = 1;

    That presents groups (by CoverageKey) in Effective order to the Segment iterator, and the Top just takes the first row for each group created by the Segment. The included columns are just there to ensure the index is covering. The leading columns are descending to allow a forward scan of the index.

  • As a regular Top-N-Per-Group problem, we can also write it using a ranking function:

    SELECT fcvsq.CoverageKey,

    fcvsq.SourceKey,

    fcvsq.Effective AS EffectiveDt,

    fcvsq.CovRefundKey,

    fcvsq.TypeKey,

    fcvsq.StatusKey

    FROM (

    SELECT *,

    ROW_NUMBER() OVER (PARTITION BY CoverageKey ORDER BY Effective DESC) AS Sequence

    FROM dbo.coverageversions

    ) AS fcvsq

    WHERE fcvsq.Sequence = 1;

    The optimal index is the same as previously, except CoverageKey is ASC:

    CREATE INDEX [IX dbo.CoverageVersions CoverageKey,Effective (SourceKey,CovRefundKey,TypeKey,StatusKey)]

    ON dbo.CoverageVersions (CoverageKey ASC, Effective DESC)

    INCLUDE (SourceKey, CovRefundKey, TypeKey, StatusKey)

    WITH (FILLFACTOR = 100, SORT_IN_TEMPDB = ON, MAXDOP = 1, DROP_EXISTING = ON);

    This gives a similar plan to Segment Top:

    It's a very similar idea, with the Sequence Project numbering the rows, and the Filter returning just row #1 per group.

  • Good stuff as always Paul! Bummer is that the optimal index is essentially another copy of the table, which could be large given that it is a DW fact table and also gum up the data load, index maintenance, backups, etc. But if the query needs to run fastest sometimes you gotta do what you gotta do!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 1 through 15 (of 27 total)

You must be logged in to reply to this topic. Login to reply