I am new to Database technology, I have come up with the below SQL query with the help of this Fourm Members.
This SQL query finds duplicate records present in dbo.Changes_DupCk table by comparing it with dbo.Changes and Changes_Frze tables. The three tables are as below.
The work involves is every week, data from Changes_DupCk table has to be loaded into dbo.Changes and Changes_Frze table. But before loading we have to make sure that dbo.Changes_DupCk table has no dupicate record which is already present in one of the other two tables.
A Record in Changes_DupCk becomes duplicate if the requestnumber,suffix,AmountTypeCode, EffDate,ExpDate,Amount,AdjustmentType,AdjustmentStatus,AdjustmentUserId matches with a record in dbo.Changes or dbo.Changes_Frze table
All three have same table structure mentioned below
CREATE TABLE [dbo].[Changes_Frze](
[requestNumber] [char](12) NOT NULL,
[suffix] [char](3) NOT NULL,
[AmountTypeCode] [char](2) NOT NULL,
[EffDate] [datetime2](3) NOT NULL,
[CreationDateTime] [datetime2](3) NOT NULL,
[ExpDate] [datetime2](3) NULL,
[Amount] [money] NULL,
[AdjustmentUserId] [varchar](10) NULL,
[AdjustmentReason] [varchar](255) NULL,
[AdjustmentManagerId] [varchar](10) NULL,
[AdjustmentStatus] [char](1) NOT NULL,
[AdjustmentType] [char](1) NOT NULL,
[AdjustmentReasonId] [smallint] NULL
) ON [PRIMARY]
The SQL query I came up is
( --=== Combine all the rows as if they were from one table using UNION ALL, marking each row with a table #.
SELECT *,TableNo = 1 FROM dbo.Changes_DupCk UNION ALL
SELECT *,TableNo = 2 FROM dbo.Changes UNION ALL
SELECT *,TableNo = 3 FROM dbo.Changes_Frze
( --=== Number all duplicate rows starting the count over for each row according to the PARTITION BY
SELECT DupeCheck = ROW_NUMBER() OVER (PARTITION BY [requestNumber],AmountTypeCode,TransactionEffDate,TransactionExpDate,Amount,AdjustmentType,AdjustmentStatus,AdjustmentUserId ORDER BY TransactionEffDate)
) --=== Return only the duplicates not including the original of each duplicate.
WHERE DupeCheck > 1
This query assumes that all three tables might have duplicates. But in reality, only the Changes_DupCk table may contain duplicate record. Duplicate record in this case means that record is already present in one of other two tables. And dbo.Changes has millions of Records which all are unique. So making an UNION ALL will make my database slow, please suggest me how to optimise this query for better performance.