April 5, 2020 at 7:07 pm
Hi sql gurus,
I am currently doing some R&D on MERGE statement and as you know it's little complex one, I am trying to get one solution for which I am not able to think how to implement it.
I have to maintain a SCD type 2 dimension. My source contain data with start date and end date for say an employee and we have an employee id as identifier. For some reasons (plz don't ask as it's legacy system), our target table has a separate integer identifier along with employee id. Target table also contains start date, end date and is current flag.
Now while i was trying to implement merge, problem is how to assign new identifier value for new record (it should be max value +1 , and this column is of course not identity column).
Please help me with some example. I am scratching my head.
April 5, 2020 at 11:36 pm
You know the drill. Can you provide some sample consumable DDL, data and desired results, please?
April 6, 2020 at 10:55 am
Off course. Here it goes.
Source table contains data like this:
EmpId attr1 attr2 attr3 attr4 effStartDate effEndDate
E1001 abc xyz knop sts '2015-06-01 00:00:00.000' '9999-12-31 00:00:00.000'
E1002 vah sje jshu kjd '2015-01-01 00:00:00.000' '2016-12-31 00:00:00.000'
E1002 vah sst jshu kjd '2017-01-01 00:00:00.000' '9999-12-31 00:00:00.000'
And target dataset (SCD type 2 table) is like:
EmpAppId EmpId attr1 attr2 attr3 attr4 effStartDate effEndDate IsActive IsCurrent
1000001 E1001 abc xyz knop sts '2015-06-01 00:00:00.000' '9999-12-31 00:00:00.000' 'Y' 'Y'
1000002 E1002 vah sje jshu kjd '2015-01-01 00:00:00.000' '2016-12-31 00:00:00.000' 'Y' 'N'
1000002 E1002 vah sst jshu kjd '2017-01-01 00:00:00.000' '9999-12-31 00:00:00.000' 'Y' 'Y'
So, while next SCD update, we have to take care of following:
1. From source only take records in to consideration with efdEndDate = '9999-12-31' for SCD update .
2. From target only take records with IsCurrent ='Y' for SCD update
3. Based on just EmpId, if for a record in source its not present in target, then add new one with max(EmpAppId)+1
4. Based on just EmpId, if for a record in target its not present in source, then mark that record as inactive by updating isActive flag = 'N'.
5. If for an EmpId in source, a record is present in Target then check for effStartDate. If its not same then mark that record in target as isCurrent = 'N', effEndDate = (effStartDate of new record in source). Also make new entry of that source record in target with same EmpAppId as allocated for other records in target.
Hope I am able to clarify my problem statement in simple way. If any point not clear, please let me know.
April 6, 2020 at 1:30 pm
Now while i was trying to implement merge, problem is how to assign new identifier value for new record (it should be max value +1 , and this column is of course not identity column).
Please help me with some example. I am scratching my head.
This is exactly what I'm talking about. The lofty theorizers should come up with this code.
Could the EmpAppId column in the target table be dropped? From the way it seemingly appears here the primary key of the target table could be EmpId, effStartDate, EffEndDate. Is it so? The EmpAppId is not unique
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 6, 2020 at 1:33 pm
Ah, my apologies. I'd assumed that, with >2,700 points, you would understand exactly what I meant.
What we need here is code which other developers can run in SSMS.
So
1)
DROP TABLE IF EXISTS #Source
CREATE TABLE #Source(EmpId VARCHAR(x) NOT NULL, etc)
Same for #Target
2)
INSERT #Source
VALUES ()
Same for #Target
3)
What should #Target look like after the query has run.
Ideally represented as
SELECT *
FROM
VALUES (),()
April 6, 2020 at 3:58 pm
This is a beginning and it might get the OP past the id assignment issue.
drop table if exists #src_data;
go
create table #src_data(
EmpId varchar(12),
attr1 char(3),
attr2 char(3),
attr3 char(4),
attr4 char(3),
effStartDate datetime,
effEndDate datetime);
drop table if exists #tgt_data;
create table #tgt_data(
EmpAppId int,
EmpId varchar(12),
attr1 char(3),
attr2 char(3),
attr3 char(4),
attr4 char(3),
effStartDate datetime,
effEndDate datetime,
IsActive char(1),
IsCurrent char(1));
insert #src_data(EmpId, attr1, attr2, attr3, attr4, effStartDate, effEndDate) values
('E1001', 'abc', 'xyz', 'knop', 'sts', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000'),
('E1002', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000'),
('E1002', 'vah', 'sst', 'jshu', 'kjd', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000'),
/* new records */
('E9998', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '9999-12-31 00:00:00.000'),
('E9999', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '9999-12-31 00:00:00.000');
insert #tgt_data(EmpAppId, EmpId, attr1, attr2, attr3, attr4, effStartDate, effEndDate, IsActive, IsCurrent) values
(1000001, 'E1001', 'abc', 'xyz', 'knop', 'sts', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y'),
(1000002, 'E1002', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000', 'Y', 'N'),
(1000002, 'E1002', 'vah', 'sst', 'jshu', 'kjd', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y'),
/* IsActive set to 'No' */
(9000001, 'E9002', 'vah', 'sst', 'jshu', 'kjd', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y');
set nocount on;
set xact_abort on;
begin transaction
begin try
--3)
with
max_key_cte(max_id) as (
select max(EmpAppId) max_id from #tgt_data)
insert #tgt_data(EmpAppId, EmpId, attr1, attr2, attr3, attr4, effStartDate, effEndDate, IsActive, IsCurrent)
select
mkc.max_id+row_number() over (order by (select null)),
sd.*,
'Y',
'Y'
from
max_key_cte mkc
cross join
#src_data sd
where
sd.effEndDate='9999-12-31'
and not exists(select 1 from #tgt_data td where sd.EmpId=td.EmpId);
--4)
update td
set
IsActive='N'
from
#tgt_data td
where not exists
(select 1 from #src_data sd where td.EmpId=sd.EmpId);
--5) ???
commit transaction;
end try
begin catch
print ('No bueno');
rollback transaction;
end catch
select * from #tgt_data;
Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können
April 6, 2020 at 9:35 pm
Here is the script to have data at initial stage. Here table Target acts as SCD type 2 table
IF OBJECT_ID('tempdb.dbo.#Source', 'U') IS NOT NULL
  DROP TABLE dbo.#Source; 
CREATE TABLE dbo.#Source
(EmpId varchar(10), attr1 varchar(10), attr2 varchar(10), attr3 varchar(10), effStartDate datetime, effEndDate datetime)
insert into dbo.#Source
select 'E1001', 'Robert', 'Kreigik', 'WA', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
select 'E1002', 'Samuel', 'Vaas', 'WA', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000' union all
select 'E1002', 'Samuel', 'D Vaas', 'WA', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
select 'E1003', 'Ned', 'Menderick', 'NY', '2016-09-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
select 'E1004', 'Damien', 'Martin', 'NY', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000'
IF OBJECT_ID('tempdb.dbo.#Target', 'U') IS NOT NULL
  DROP TABLE dbo.#Target; 
CREATE TABLE dbo.#Target
(EmpAppId int, EmpId varchar(10), attr1 varchar(10), attr2 varchar(10), attr3 varchar(10), effStartDate datetime, effEndDate datetime, 
IsActive char(1), IsCurrent char(1))
Insert into dbo.#Target
select 1000001, 'E1001', 'Robert', 'Kreigik', 'WA', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y' UNION ALL
select 1000002, 'E1002', 'Samuel', 'Vaas', 'WA', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000', 'Y', 'N' UNION ALL
select 1000002, 'E1002', 'Samuel', 'D Vaas', 'WA', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y' UNION ALL
select 1000003, 'E1003', 'Ned', 'Menderick', 'NY', '2016-09-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y' UNION ALL
select 1000004, 'E1004', 'Damien', 'Martin', 'NY', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y'Once you ran the above scripts, you will get output like this:

Now, after second run we get the new feed in source again (as truncate and load) and it looks like this:
truncate table dbo.#Source
insert into dbo.#Source
select 'E1001', 'Robert', 'Kreigik', 'WA', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
select 'E1002', 'Samuel', 'Vaas', 'WA', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000' union all
select 'E1002', 'Samuel', 'D Vaas', 'WA', '2017-01-01 00:00:00.000', '2017-05-31 00:00:00.000' union all
select 'E1002', 'Samuel', 'D Vaas', 'NY', '2017-06-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
select 'E1003', 'Ned', 'Menderick', 'NY', '2016-09-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
select 'E1005', 'Ruth', 'Gardener', 'OH', '2017-06-01 00:00:00.000', '9999-12-31 00:00:00.000'
The output looks like this:

Now i need a MERGE statement only so that i can update Target table in a way that:
Hope it clarifies the requirement. I want to implement it as Merge only because actually these are big tables and i dont want to scan them again and again and also mainly want to learn Merge with this example.
April 8, 2020 at 3:17 am
Can anyone please help on this please ?
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply