Thanks for all the replies...nah, just kidding. Damn, I must ask the really hard questions 😉
The first view is now:
CREATE VIEW [dwh].[vwDIAGNOSIS]
AS
-- We only care about current rows from the target table
WITH cteCurrentRow AS(
SELECT FACILITY_STAY_EPISODE_ID, diagnosis_type, CHECKSUM
FROM dwh.DIAGNOSIS tgt WITH (NOLOCK)
WHERE tgt.CURRENT_ROW='Y'
)
SELECT *
FROM (
SELECT CASE
WHEN (src.FACILITY_STAY_EPISODE_ID IS NOT NULL AND tgt.FACILITY_STAY_EPISODE_ID IS NULL) THEN 'INSERT'
WHEN (src.FACILITY_STAY_EPISODE_ID IS NOT NULL AND tgt.FACILITY_STAY_EPISODE_ID IS NOT NULL) THEN
CASE
WHEN (src.replica_valid_to_date < (SELECT DataExtract_EndTime FROM meta.vwDataExtract_EndTime)) THEN 'DELETE'
ELSE
CASE
WHEN (src.CHECKSUM = tgt.CHECKSUM) THEN 'NOCHANGE'
ELSE 'CHANGE'
END
END
END AS ACTION
,src.FACILITY_STAY_EPISODE_ID
,src.FACILITY_STAY_ID
,src.FACILITY_ID
,src.diagnosis_type
,src.diagnosis_code
,src.clinical_codeset
,src.clinical_codeset_curr
,src.diagnosis_code_curr
,src.diagnosis_code_ccl
,src.condition_onset_flag
,src.replica_valid_from_date
,src.replica_valid_to_date
,src.replica_current_record
,src.replica_uuid
,src.CHECKSUM
FROM trn.DIAGNOSIS src
LEFT JOIN cteCurrentRow tgt
ON src.FACILITY_STAY_EPISODE_ID=tgt.FACILITY_STAY_EPISODE_ID
AND src.diagnosis_type=tgt.diagnosis_type
) scd
WHERE scd.ACTION IN ('INSERT','CHANGE','DELETE')
GO
Again, this view is used in SSDT as an OLEDB data source, writing to an OLEDB data target, and that target is dwh.DIAGNOSIS, which is read by the view. The properties of the target table are Keep identity; Keep nulls; Table lock; and Check constraints are all unchecked, and the Maximum insert commit size 2147483647
The WITH (NOLOCK) hint means it's a dirty read. The processing only inserts rows, either new inserts or change rows.
So, is the dirty read and/or the Table lock (unticked) a Really Bad Idea?
If so, can anyone enlighten me on 1) database properties Allow Snapshot Isolation & Is Read Committed Snapshot On, and 2) SET TRANSACTION ISOLATION LEVEL SNAPSHOT???
The post-processing is now:
CREATE VIEW [scd].[vwDIAGNOSIS]
AS
-- Use the LEAD() windowing function over the ENTIRE TABLE to set VALID_TO and CURRENT_ROW
WITH cteUpdateSCD2Columns
AS (
SELECT FACILITY_STAY_EPISODE_ID
,diagnosis_type
,LEAD(VALID_FROM,1,{D '9999-12-31'}) OVER (
PARTITION BY FACILITY_STAY_EPISODE_ID, diagnosis_type
ORDER BY VALID_FROM
) AS VALID_TO
,CASE
WHEN LEAD(VALID_FROM,1,NULL) OVER (
PARTITION BY FACILITY_STAY_EPISODE_ID, diagnosis_type
ORDER BY VALID_FROM
) IS NULL THEN 'Y' ELSE 'N'
END AS CURRENT_ROW
FROM dwh.DIAGNOSIS WITH (NOLOCK)
)
-- Join with view to only update changed or deleted rows
SELECT src.ACTION
,src.FACILITY_STAY_EPISODE_ID
,src.diagnosis_type
,scd.VALID_TO
,scd.CURRENT_ROW
FROM cteUpdateSCD2Columns scd
INNER JOIN dwh.vwDIAGNOSIS src WITH (NOLOCK)
ON src.FACILITY_STAY_EPISODE_ID=scd.FACILITY_STAY_EPISODE_ID
AND src.diagnosis_type=scd.diagnosis_type
WHERE src.ACTION IN ('CHANGE','DELETE')
GO
And this stored procedure:
ALTER PROCEDURE [scd].[sp_MoH_PostProcess_DIAGNOSIS]
AS
BEGIN
SET NOCOUNT ON
-- Only process current rows - once a row is closed out it must never be changed again
BEGIN TRANSACTION
-- Execute this as two separate batches
-- Post-processing for changes
PRINT 'Processing CHANGES'
UPDATE dwh.DIAGNOSIS
SET VALID_TO=scd.VALID_TO
,CURRENT_ROW=scd.CURRENT_ROW
FROM scd.vwDIAGNOSIS scd WITH (NOLOCK)
WHERE DIAGNOSIS.FACILITY_STAY_EPISODE_ID=scd.FACILITY_STAY_EPISODE_ID
AND DIAGNOSIS.diagnosis_type=scd.diagnosis_type
AND DIAGNOSIS.CURRENT_ROW='Y'
AND scd.ACTION='CHANGE'
PRINT CONCAT(@@rowcount,' rows affected
')
-- Post-processing for deletes
PRINT 'Processing DELETES'
UPDATE dwh.DIAGNOSIS
SET VALID_TO=GETDATE()
,CURRENT_ROW='N'
FROM scd.vwDIAGNOSIS scd WITH (NOLOCK)
WHERE DIAGNOSIS.FACILITY_STAY_EPISODE_ID=scd.FACILITY_STAY_EPISODE_ID
AND DIAGNOSIS.diagnosis_type=scd.diagnosis_type
AND DIAGNOSIS.CURRENT_ROW='Y'
AND scd.ACTION='DELETE'
PRINT CONCAT(@@rowcount,' rows affected
')
COMMIT
END
Again, any enlightenment on why WITH (NOLOCK) is a bad idea - and suitable alternative approaches - would be much appreciated.
I could change the stored procedure to read from the scd.vwDIAGNOSIS view and insert the SK's into a temp table, if that would be a better design???
Thanks...