Kim Crosser (10/8/2015)
Kristen-173977 (9/29/2015)
rchantler (9/28/2015)
CREATE CLUSTERED INDEX IDX_hidvals ON #hidvals(hid)That is not declared as UNIQUE, so not quite the same thing as a PRIMARY KEY's Clustered Index (SQL will add a tie-break value which will make the key size wider).
Also, Primary Key will not allow NULLs which might effect how the optimiser treats the index - although you have declared [hid] as NOT NULL so maybe that is as broad as it is long ...
Maybe I am missing something, but I think Kristen hit the real issue here. A Clustered Index does NOT imply uniqueness, while a Primary Key does. So when you are trying to update the table with a Primary Key definition, SQL Server must test each new record for non-existence in the table.
Simply defining an index as Clustered just causes efficient organization of the key/data. SQL Server doesn't need to test each new value to see if it is already in the table.
Unless I missed a test in there somewhere, did you try defining that index as a UNIQUE CLUSTERED index? I would expect that to have the same performance as the Primary Key.
The script posted by rchantler (thanks, very handy) demonstrates the effect nicely. I've made a number of changes to it to eliminate some chaff and get a better idea of the problem domain. Both queries now use nonclustered indexes except of course for the index on the temp table, and the problem remains. The index on the temp table is now just a clustered index. If you comment out either one of the two left joins, or change either of them to inner joins, the problem goes. If you add [start] or [finish] (or both) to the key list of the clustered index on the temp table as the second key, the problem changes but it's not fixed.
Note that #hidvals.start has the same value for every row, as does #hidvals.finish.
Creating statistics on [start] / [finish] doesn't change anything, the problem remains. Here's the modified script:
-- Note: You will need a tally table to create the test data
-- Early data is monthly; later data is daily
-- Define a mapping table [DateMap] that maps early monthly data to daily dates
-- and later daily data to daily dates so data can be pulled for daily
-- dates over the entire date range
-- create tally table
IF OBJECT_ID('TempDB..#tally2') IS NOT NULL drop table #tally2
;WITH Ten AS (SELECT * FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) d (n))
SELECT n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) INTO #tally2 FROM Ten a, Ten b, Ten c, Ten d, Ten e
CREATE UNIQUE CLUSTERED INDEX cx_tally ON #tally2 (n)
GO
IF OBJECT_ID('dbo.DateMap') IS NOT NULL drop table dbo.DateMap
CREATE TABLE [dbo].[DateMap]
(
[adate] [smalldatetime] NOT NULL,
[calcdate] [smalldatetime] NULL,
Guid1 UNIQUEIDENTIFIER,
Guid2 UNIQUEIDENTIFIER,
Guid3 UNIQUEIDENTIFIER
)
INSERT INTO dbo.DateMap(adate, Guid1, Guid2, Guid3 )
SELECT CAST(N AS SMALLDATETIME), NEWID(), NEWID(), NEWID() -- make table wider to encourage use of index CJM
FROM #tally2 -- First use of Tally Table
WHERE N BETWEEN 29249 AND 42275
UPDATE dbo.DateMap
SET calcdate = adate
WHERE adate >= '31-aug-2008'
-- Define two CTEs
;WITH datecalc AS(
SELECT *, y= Year(adate), m= Month(adate), rn = ROW_NUMBER()OVER(ORDER BY adate)
FROM dbo.DateMap)
, monthends AS(
SELECT DC1.*
FROM datecalc DC1
JOIN datecalc DC2
ON DC1.rn = DC2.rn-1
WHERE DC1.m <> DC2.m)
UPDATE DM
SET calcDate = X.adate
FROM dbo.dateMap DM
OUTER APPLY( SELECT TOP 1 adate
FROM monthends M
WHERE DM.adate >= M.adate
ORDER BY M.adate DESC
) X
WHERE DM.calcDate IS NULL
--ALTER TABLE dbo.DateMap ADD CONSTRAINT PK_DateMap PRIMARY KEY CLUSTERED(Adate ASC)
CREATE UNIQUE CLUSTERED INDEX UCX_DateMap ON dbo.DateMap (Guid1)
CREATE UNIQUE NONCLUSTERED INDEX IX_DateMap_calcdate_adate ON dbo.DateMap (CalcDate ASC, Adate ASC)
CREATE UNIQUE NONCLUSTERED INDEX IX_DateMap_adate ON dbo.DateMap (Adate ASC) INCLUDE (CalcDate)
GO
-- Create the test data
IF OBJECT_ID('dbo.TestCalcs') IS NOT NULL drop table dbo.TestCalcs
CREATE TABLE dbo.TestCalcs(adate SMALLDATETIME NOT NULL,
hid INT NOT NULL,
value FLOAT NULL)
INSERT INTO dbo.testCalcs
SELECT X.adate,Y.hid, RAND(Y.hid)
FROM (select distinct calcdate as adate from dbo.dateMap) X
CROSS JOIN (select top 4000 N as hid from #tally2 order by N) Y -- second use of tally table
CREATE UNIQUE CLUSTERED INDEX UCX_testCalcs_adate_hid ON dbo.testCalcs (adate, hid)
CREATE UNIQUE NONCLUSTERED INDEX IX_testCalcs_hid_adate ON dbo.testCalcs(hid ASC, adate ASC) INCLUDE (value)
CREATE UNIQUE NONCLUSTERED INDEX IX_testCalcs_adate_value ON dbo.testCalcs(adate ASC, value) INCLUDE (hid)
-- Create the view that joins the test data to the mapping table
IF OBJECT_ID('dbo.TestCalcsView') IS NOT NULL drop view dbo.TestCalcsView
GO
CREATE VIEW dbo.testCalcsView AS
SELECT
DM.adate,
CL.hid,
CL.value
FROM dbo.DateMap DM
JOIN dbo.testCalcs CL
ON DM.CalcDate = CL.Adate
GO
SELECT COUNT(*) FROM dbo.testCalcsView -- 52,108,000 rows
--=============================================================================================================================================
-- Temp table
IF OBJECT_ID('tempdb..#hidvals') IS NOT NULL DROP TABLE #hidvals
CREATE TABLE #hidvals(hid INT NOT NULL,
extel2code CHAR(2),
start SMALLDATETIME,
finish SMALLDATETIME,
value1 FLOAT,
value2 FLOAT)
INSERT INTO #hidvals(hid, start, finish)
SELECT hid ,'17-sep-2015', '18-sep-2015'
FROM testCalcs
WHERE adate = '18-sep-2015'
CREATE UNIQUE CLUSTERED INDEX ucx_hidvals ON #hidvals(hid)
CREATE STATISTICS stx_finish ON #hidvals (finish)
CREATE STATISTICS stx_start ON #hidvals (start)
CREATE STATISTICS stx_startfinish ON #hidvals (start,finish)
UPDATE STATISTICS testCalcs
UPDATE STATISTICS DateMap
--=======================================================================================================
DROP INDEX ucx_hidvals ON #hidvals
SELECT -- Q1, no index
value1 = X1.[value],
value2 = X2.[value]
FROM #hidvals H
LEFT JOIN dbo.testCalcsView X1
ON H.hid = X1.hid AND H.start = X1.adate
left JOIN dbo.testCalcsView X2
ON H.hid = X2.hid AND H.finish = X2.adate
--CREATE UNIQUE CLUSTERED INDEX ucx_hidvals ON #hidvals(hid)
CREATE UNIQUE CLUSTERED INDEX ucx_hidvals ON #hidvals(hid,finish)
SELECT -- Q2, unique clustered index
value1 = X1.[value],
value2 = X2.[value]
FROM #hidvals H
LEFT JOIN dbo.testCalcsView X1
ON H.hid = X1.hid AND H.start = X1.adate
left JOIN dbo.testCalcsView X2
ON H.hid = X2.hid AND H.finish = X2.adate
--=======================================================================================================
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden