Newbie:Why is select doing a table scan?

  • IX_sysValidationList is a unique, non-clustered index on the column, ValidationPartyID. Is this index not appropriate for these joins?

    Also, when I try to use this query as the basis of an indexed view, I get a pop-up that states an indexed view cannot contain any outer joins. This surprised me, as I would expect that one reason for using an indexed view is to present a denormalized view of the data for reporting that all users can access without building their own copy. In this case, an employee has many relations to additional details, some of which may not be present for all employees, but the view should have a row for each employee, regardless of the presence or absence of one or more types of additional details. If only inner joins can be used, I expect many of the employees will not appear in the view. I hope I'm missing something fundamental, because I would expect this to be a common scenario that people need to address.

  • Ed (1/26/2009)


    IX_sysValidationList is a unique, non-clustered index on the column, ValidationPartyID. Is this index not appropriate for these joins?

    You should really have a clustered index on the Primary key, unless you have a good reason not too,.

  • From my brief reading on clustered index, they're useful for returning a range of values that have some meaningful sort order. Since this table uses a GUID for uniqueness, and there are no meaningful ranges for these values, I don't see the benefit of a clustered index in this case. Are there other benefits to a clustered index?

  • Ed (1/26/2009)


    IX_sysValidationList is a unique, non-clustered index on the column, ValidationPartyID. Is this index not appropriate for these joins?

    Apparently not. Probably because it's not covering. I haven't looked at the plan in detail.

    Is there any particular reason why there's no clustered index?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ed (1/26/2009)


    From my brief reading on clustered index, they're useful for returning a range of values that have some meaningful sort order. Since this table uses a GUID for uniqueness, and there are no meaningful ranges for these values, I don't see the benefit of a clustered index in this case. Are there other benefits to a clustered index?

    Clustered indexes are useful as they are the order in which the data is stored, this is why you can only have one per table. Without a clustered index you simply have a heap table with no look-up ability , so every query will have to scan through every row to get the data required.

  • Thanks. The lightbulb is starting to go on.

    Next question: The sysValidationList table (I did not design this), has a column, ValidationListID, of type int that is marked as key, not null and identity. It also has a column, ValidationPartyID, of type uniqueidentifier that is marked as not null, and has a default value of (newid()). As far as I can see, all other tables use the ValidationPartyID value to relate to this table.

    Does it matter which column is defined in the clustered index? select count(*), select count(distinct ValidationListID) and select count(distinct ValidationPartyId) all return the same count.

  • Ed (1/26/2009)


    Does it matter which column is defined in the clustered index?

    Yes! Very much so. A bad choice of a cluster can do terrible things to performance. Uniqueindentifier (especially default newid) is a reall bad choice for a cluster, because of how rapidly it will cause fragmentation when insert are done.

    What's the full definition of the table?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Ed (1/26/2009)


    Thanks. The lightbulb is starting to go on.

    Next question: The sysValidationList table (I did not design this), has a column, ValidationListID, of type int that is marked as key, not null and identity. It also has a column, ValidationPartyID, of type uniqueidentifier that is marked as not null, and has a default value of (newid()). As far as I can see, all other tables use the ValidationPartyID value to relate to this table.

    Does it matter which column is defined in the clustered index? select count(*), select count(distinct ValidationListID) and select count(ValidationPartyId) all return the same count.

    It does matter what column is defined as the clustered index, not for counts ,but for generally accesing your data. Sometimes the best candiadate is the primary key other times it is the date-created. It depends on how you are using your table, how many insert/reads are happening and a few other factors. But in general the primary key is a good start.

    for the Count(*) the query optimizer will pick the best column to do the count on.

    and the other two counts you are doing are both on unique columns, so these should return the same value even when using the Distinct keyword (since you cannot have dups in a unique column)

  • Here is the definition of the table:

    /****** Object: Table [dbo].[sysValidationList] Script Date: 1/26/2009 12:27:20 PM ******/

    USE [DEV2-HRMS-036];

    GO

    SET ANSI_NULLS ON;

    GO

    SET QUOTED_IDENTIFIER ON;

    GO

    IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].[sysValidationList]') AND type in (N'U'))

    BEGIN

    DROP TABLE [DEV2-HRMS-036].[dbo].[sysValidationList];

    END

    GO

    CREATE TABLE [dbo].[sysValidationList] (

    [ValidationListID] int IDENTITY(1, 1) NOT NULL,

    [ValidationPartyID] uniqueidentifier NOT NULL DEFAULT (newid()),

    [ValidationTypePartyID] uniqueidentifier NOT NULL,

    [ValidationListCode] varchar(20) NULL,

    [ValidationListDesc] varchar(200) NOT NULL,

    [ParentValidationPartyID] uniqueidentifier NULL,

    [Hide] bit NOT NULL DEFAULT ((0)),

    [SortOrder] int NOT NULL DEFAULT ((100)),

    [SystemFlag] bit NOT NULL DEFAULT ((0)),

    [EarnieCode] varchar(10) NULL,

    [LastModifiedDate] datetime NOT NULL,

    [LastModifiedUser] varchar(50) NOT NULL,

    [UpdateCount] int NOT NULL DEFAULT ((0)),

    CONSTRAINT [IX_sysValidationList]

    UNIQUE NONCLUSTERED ([ValidationPartyID] ASC)

    WITH ( PAD_INDEX = OFF,

    FILLFACTOR = 90,

    IGNORE_DUP_KEY = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON )

    ON [PRIMARY],

    CONSTRAINT [PK_sysValidationList]

    PRIMARY KEY CLUSTERED ([ValidationListID] ASC)

    WITH ( PAD_INDEX = OFF,

    FILLFACTOR = 90,

    IGNORE_DUP_KEY = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON )

    ON [PRIMARY]

    )

    ON [PRIMARY];

    GO

    /****** Object: Index [dbo].[IX_sysValidationList_1] Script Date: 1/26/2009 12:27:20 PM ******/

    IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[sysValidationList]') AND name = N'IX_sysValidationList_1')

    BEGIN

    DROP INDEX [IX_sysValidationList_1] ON [dbo].[sysValidationList];

    END

    GO

    CREATE NONCLUSTERED INDEX [IX_sysValidationList_1]

    ON [dbo].[sysValidationList]

    ([ValidationTypePartyID])

    WITH

    (

    FILLFACTOR = 90,

    IGNORE_DUP_KEY = OFF,

    STATISTICS_NORECOMPUTE = OFF,

    ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON,

    MAXDOP = 0

    )

    ON [PRIMARY];

    GO

  • how often is table used?

    is it mainly used for inserting data or retrieving data?

    If was fairly static then i would recommened putting a clustered index on ValidationPartyID, but as this is a unique identifier and has been mentioned this would cause a lot of fragmentation.

    What is the purpose of [ValidationListID]? is this used anywhere ?

  • If that table is frequently inserted into, then I'd suggest put the cluster on the identity, even if it's never used. That way the cluster will organise the table and help with the lookups (if necessary), then you can use nonclustered indexes to support queries or joins.

    For the current query that you're looking for, I'd suggest (even if you don't add a cluster), an index on ValidationPartyID INCLUDE ValidationListCode.

    You can't add an include to the unique constraint, but if you change it to a unique index, you can.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (1/26/2009)


    If that table is frequently inserted into, then I'd suggest put the cluster on the identity, even if it's never used. That way the cluster will organise the table and help with the lookups (if necessary), then you can use nonclustered indexes to support queries or joins.

    For the current query that you're looking for, I'd suggest (even if you don't add a cluster), an index on ValidationPartyID INCLUDE ValidationListCode.

    You can't add an include to the unique constraint, but if you change it to a unique index, you can.

    Ditto.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It looks to me like SQL is using the non-clus index on the lookup table. It scans it once for each LOJ, building a hash table to match to the corresponding column. The index prevents SQL from having to sort the values.

    But, why doesn't it just scan the lookup table and build the hash table once, and then use that one generated hash table for all (20 of) the lookups? Good q, but it's not doing it.

    Make sure that you keep that index properly defragmented / rebuilt and that you use the optimum amount of freespace on it (naturally that will require trial and error to get right).

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (1/27/2009)


    It looks to me like SQL is using the non-clus index on the lookup table.

    If you look at the exec plan, you'll notice it's a table scan on the lookup, not an index seek.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If you look at the exec plan, you'll notice it's a table scan on the lookup, not an index seek.

    I never said it was doing a seek, I said it was using the index. It looked to me like SQL was scanning the non-clus index to get the key values to hash, rather than scanning the full table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 15 posts - 16 through 30 (of 31 total)

You must be logged in to reply to this topic. Login to reply