• Kim Crosser (10/8/2015)


    Kristen-173977 (9/29/2015)


    rchantler (9/28/2015)


    CREATE CLUSTERED INDEX IDX_hidvals ON #hidvals(hid)

    That is not declared as UNIQUE, so not quite the same thing as a PRIMARY KEY's Clustered Index (SQL will add a tie-break value which will make the key size wider).

    Also, Primary Key will not allow NULLs which might effect how the optimiser treats the index - although you have declared [hid] as NOT NULL so maybe that is as broad as it is long ...

    Maybe I am missing something, but I think Kristen hit the real issue here. A Clustered Index does NOT imply uniqueness, while a Primary Key does. So when you are trying to update the table with a Primary Key definition, SQL Server must test each new record for non-existence in the table.

    Simply defining an index as Clustered just causes efficient organization of the key/data. SQL Server doesn't need to test each new value to see if it is already in the table.

    Unless I missed a test in there somewhere, did you try defining that index as a UNIQUE CLUSTERED index? I would expect that to have the same performance as the Primary Key.

    The script posted by rchantler (thanks, very handy) demonstrates the effect nicely. I've made a number of changes to it to eliminate some chaff and get a better idea of the problem domain. Both queries now use nonclustered indexes except of course for the index on the temp table, and the problem remains. The index on the temp table is now just a clustered index. If you comment out either one of the two left joins, or change either of them to inner joins, the problem goes. If you add [start] or [finish] (or both) to the key list of the clustered index on the temp table as the second key, the problem changes but it's not fixed.

    Note that #hidvals.start has the same value for every row, as does #hidvals.finish.

    Creating statistics on [start] / [finish] doesn't change anything, the problem remains. Here's the modified script:

    -- Note: You will need a tally table to create the test data

    -- Early data is monthly; later data is daily

    -- Define a mapping table [DateMap] that maps early monthly data to daily dates

    -- and later daily data to daily dates so data can be pulled for daily

    -- dates over the entire date range

    -- create tally table

    IF OBJECT_ID('TempDB..#tally2') IS NOT NULL drop table #tally2

    ;WITH Ten AS (SELECT * FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n))

    SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) INTO #tally2 FROM Ten a, Ten b, Ten c, Ten d, Ten e

    CREATE UNIQUE CLUSTERED INDEX cx_tally ON #tally2 (n)

    GO

    IF OBJECT_ID('dbo.DateMap') IS NOT NULL drop table dbo.DateMap

    CREATE TABLE [dbo].[DateMap]

    (

    [adate] [smalldatetime] NOT NULL,

    [calcdate] [smalldatetime] NULL,

    Guid1 UNIQUEIDENTIFIER,

    Guid2 UNIQUEIDENTIFIER,

    Guid3 UNIQUEIDENTIFIER

    )

    INSERT INTO dbo.DateMap(adate, Guid1, Guid2, Guid3 )

    SELECT CAST(N AS SMALLDATETIME), NEWID(), NEWID(), NEWID() -- make table wider to encourage use of index CJM

    FROM #tally2 -- First use of Tally Table

    WHERE N BETWEEN 29249 AND 42275

    UPDATE dbo.DateMap

    SET calcdate = adate

    WHERE adate >= '31-aug-2008'

    -- Define two CTEs

    ;WITH datecalc AS(

    SELECT *, y= Year(adate), m= Month(adate), rn = ROW_NUMBER()OVER(ORDER BY adate)

    FROM dbo.DateMap)

    , monthends AS(

    SELECT DC1.*

    FROM datecalc DC1

    JOIN datecalc DC2

    ON DC1.rn = DC2.rn-1

    WHERE DC1.m <> DC2.m)

    UPDATE DM

    SET calcDate = X.adate

    FROM dbo.dateMap DM

    OUTER APPLY( SELECT TOP 1 adate

    FROM monthends M

    WHERE DM.adate >= M.adate

    ORDER BY M.adate DESC

    ) X

    WHERE DM.calcDate IS NULL

    --ALTER TABLE dbo.DateMap ADD CONSTRAINT PK_DateMap PRIMARY KEY CLUSTERED(Adate ASC)

    CREATE UNIQUE CLUSTERED INDEX UCX_DateMap ON dbo.DateMap (Guid1)

    CREATE UNIQUE NONCLUSTERED INDEX IX_DateMap_calcdate_adate ON dbo.DateMap (CalcDate ASC, Adate ASC)

    CREATE UNIQUE NONCLUSTERED INDEX IX_DateMap_adate ON dbo.DateMap (Adate ASC) INCLUDE (CalcDate)

    GO

    -- Create the test data

    IF OBJECT_ID('dbo.TestCalcs') IS NOT NULL drop table dbo.TestCalcs

    CREATE TABLE dbo.TestCalcs(adate SMALLDATETIME NOT NULL,

    hid INT NOT NULL,

    value FLOAT NULL)

    INSERT INTO dbo.testCalcs

    SELECT X.adate,Y.hid, RAND(Y.hid)

    FROM (select distinct calcdate as adate from dbo.dateMap) X

    CROSS JOIN (select top 4000 N as hid from #tally2 order by N) Y -- second use of tally table

    CREATE UNIQUE CLUSTERED INDEX UCX_testCalcs_adate_hid ON dbo.testCalcs (adate, hid)

    CREATE UNIQUE NONCLUSTERED INDEX IX_testCalcs_hid_adate ON dbo.testCalcs(hid ASC, adate ASC) INCLUDE (value)

    CREATE UNIQUE NONCLUSTERED INDEX IX_testCalcs_adate_value ON dbo.testCalcs(adate ASC, value) INCLUDE (hid)

    -- Create the view that joins the test data to the mapping table

    IF OBJECT_ID('dbo.TestCalcsView') IS NOT NULL drop view dbo.TestCalcsView

    GO

    CREATE VIEW dbo.testCalcsView AS

    SELECT

    DM.adate,

    CL.hid,

    CL.value

    FROM dbo.DateMap DM

    JOIN dbo.testCalcs CL

    ON DM.CalcDate = CL.Adate

    GO

    SELECT COUNT(*) FROM dbo.testCalcsView -- 52,108,000 rows

    --=============================================================================================================================================

    -- Temp table

    IF OBJECT_ID('tempdb..#hidvals') IS NOT NULL DROP TABLE #hidvals

    CREATE TABLE #hidvals(hid INT NOT NULL,

    extel2code CHAR(2),

    start SMALLDATETIME,

    finish SMALLDATETIME,

    value1 FLOAT,

    value2 FLOAT)

    INSERT INTO #hidvals(hid, start, finish)

    SELECT hid ,'17-sep-2015', '18-sep-2015'

    FROM testCalcs

    WHERE adate = '18-sep-2015'

    CREATE UNIQUE CLUSTERED INDEX ucx_hidvals ON #hidvals(hid)

    CREATE STATISTICS stx_finish ON #hidvals (finish)

    CREATE STATISTICS stx_start ON #hidvals (start)

    CREATE STATISTICS stx_startfinish ON #hidvals (start,finish)

    UPDATE STATISTICS testCalcs

    UPDATE STATISTICS DateMap

    --=======================================================================================================

    DROP INDEX ucx_hidvals ON #hidvals

    SELECT -- Q1, no index

    value1 = X1.[value],

    value2 = X2.[value]

    FROM #hidvals H

    LEFT JOIN dbo.testCalcsView X1

    ON H.hid = X1.hid AND H.start = X1.adate

    left JOIN dbo.testCalcsView X2

    ON H.hid = X2.hid AND H.finish = X2.adate

    --CREATE UNIQUE CLUSTERED INDEX ucx_hidvals ON #hidvals(hid)

    CREATE UNIQUE CLUSTERED INDEX ucx_hidvals ON #hidvals(hid,finish)

    SELECT -- Q2, unique clustered index

    value1 = X1.[value],

    value2 = X2.[value]

    FROM #hidvals H

    LEFT JOIN dbo.testCalcsView X1

    ON H.hid = X1.hid AND H.start = X1.adate

    left JOIN dbo.testCalcsView X2

    ON H.hid = X2.hid AND H.finish = X2.adate

    --=======================================================================================================

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden