SQLServerCentral Article

Incremental Data Loading Using ADF and Change Tracking

,

Introduction

In the article, named Incremental Data loading through ADF using Change Tracking – SQLServerCentral, I discussed the step-by-step process for incremental data loading using Change Tracking. Incremental data loading was shown for a table, named Student, from an on-premise SQL Server to an Azure SQL database. In this article, I will perform the incremental data loading for three related tables using Azure Data Factory and Change Tracking.

Implementation

I will create three tables, named Faculty, Department, and Student, in an on-premise SQL database. I will create three tables, named AFaculty, ADepartment, and AStudent in Azure SQL database with a similar structure to the on-premise tables. Then, I will do the incremental data loading from the on-premise SQL tables to the Azure SQL tables.

Configuration and Table Creation in SQL Server

I start SSMS and connect to the existing on-premise SQL Server. I open a SQL script in the existing database, named ResearchWork. First, I alter the database to set change tracking. Then I drop the tables Student, Department, and Faculty in order, in case the tables already exist.

The Student table has a foreign key, which references the Department table. The Department has a foreign key, which references Faculty table. So, we need to maintain the order of the tables while using DROP, CREATE, INSERT, UPDATE, DELETE statements on the tables.

After dropping the tables, I create the tables Faculty, Department, and Student in order. Next, I alter the three tables to enable Change Tracking.

--1. change tracking set to ON for the database
ALTER DATABASE ResearchWork  
SET CHANGE_TRACKING = ON  
(CHANGE_RETENTION = 2 DAYS, AUTO_CLEANUP = ON)
--2. drop the table Student
IF OBJECT_ID('dbo.Student','U') IS NOT NULL
DROP TABLE dbo.Student
--3. drop the table Department
IF OBJECT_ID('dbo.Department','U') IS NOT NULL
DROP TABLE dbo.Department
--4. drop the table Faculty
IF OBJECT_ID('dbo.Faculty','U') IS NOT NULL
DROP TABLE dbo.Faculty
--5. create the table Faculty
CREATE TABLE dbo.Faculty (
facultyId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
facultyName VARCHAR(50),
DOJ Date,
createDate datetime2 NULL,
updateDate datetime2 NULL
)
--6. create the table Department
CREATE TABLE dbo.Department (
deptId INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
deptName VARCHAR(50),
deptHeadId INT NOT NULL FOREIGN KEY REFERENCES dbo.Faculty (facultyId),
createDate datetime2 NULL,
updateDate datetime2 NULL
)
--7. create the table Student
CREATE TABLE dbo.Student (
studentId int IDENTITY(1,1) NOT NULL PRIMARY KEY,
studentName varchar (100) NULL,
deptId INT NOT NULL FOREIGN KEY REFERENCES dbo.Department (deptId),
marks int NULL,
createDate datetime2 NULL,
updateDate datetime2 NULL
) 
GO
--8. alter the Student table to set change tracking
ALTER TABLE dbo.Student 
ENABLE CHANGE_TRACKING  
WITH (TRACK_COLUMNS_UPDATED = ON)
--9. alter the Department table to set change tracking
ALTER TABLE dbo.Department 
ENABLE CHANGE_TRACKING  
WITH (TRACK_COLUMNS_UPDATED = ON)
--10. alter the Faculty table to set change tracking
ALTER TABLE dbo.Faculty 
ENABLE CHANGE_TRACKING  
WITH (TRACK_COLUMNS_UPDATED = ON)

Create and Populate the ChangeTrackingStore Table

I create a table, named ChangeTrackingStore, to maintain the change tracking version information for the three tables.

I retrieve the change tracking version associated with the last committed transaction of the database and insert this value for all three tables into ChangeTrackingStore. This version information will be used as a reference for reflecting the changes from source to target tables.

--11. drop the table ChangeTrackingStore
IF OBJECT_ID('dbo.ChangeTrackingStore','U') IS NOT NULL
DROP TABLE dbo.ChangeTrackingStore
--12. create the ChangeTrackingStore table
create table dbo.ChangeTrackingStore
(
    tableName varchar(255),
    SYS_CHANGE_VERSION BIGINT,
);
--13. retrieve the latest value from CHANGE_TRACKING_CURRENT_VERSION() and populate in ChangeTrackingStore for the three tables
DECLARE @changeTrackingVersion BIGINT
SET @changeTrackingVersion = CHANGE_TRACKING_CURRENT_VERSION();  
SELECT @changeTrackingVersion
INSERT INTO dbo.ChangeTrackingStore
VALUES 
('Student', @changeTrackingVersion),
('Department', @changeTrackingVersion),
('Faculty', @changeTrackingVersion)
--14. select data from ChangeTrackingStore table
SELECT tableName,SYS_CHANGE_VERSION
FROM dbo.ChangeTrackingStore

Populate Data in the Three Source Tables

First, I insert four records in the Faculty table and verify the inserted data. Then, I insert two records in Department table. The deptHeadId column is a foreign key and references the facultyId column of Faculty table. So, the value inserted in the deptHeadId column must already be present in the facultyId column of Faculty table.

Next, I insert six records in Student table. The deptId column is a foreign key and references the deptId column of Department table. So, the value inserted in the deptId column must already be present in the deptId column of Department table.

--15. populate data in Faculty table
INSERT INTO dbo.Faculty (facultyName,DOJ,createDate,updateDate)
VALUES
('fname1','01-01-2021',GETDATE(), GETDATE()),
('fname2','01-02-2021',GETDATE(), GETDATE()),
('fname3','01-03-2021',GETDATE(), GETDATE()),
('fname4','01-04-2021',GETDATE(), GETDATE())
SELECT facultyId,facultyName,DOJ,createDate,updateDate
FROM dbo.Faculty
--16. populate data in Department table
INSERT INTO dbo.Department (deptName,deptHeadId,createDate,updateDate)
VALUES
('dept1',1,GETDATE(), GETDATE()),
('dept2',2,GETDATE(), GETDATE())

SELECT deptId,deptName,deptHeadId,createDate,updateDate
FROM dbo.Department
--17. populate data in student table
INSERT INTO dbo.Student (studentName,deptId,marks,createDate,updateDate)
VALUES
('st1',1,94,GETDATE(), GETDATE()),
('st2',1,95,GETDATE(), GETDATE()),
('st3',1,92,GETDATE(), GETDATE()),
('st4',1,93,GETDATE(), GETDATE()),
('st5',1,90,GETDATE(), GETDATE()),
('st6',2,89,GETDATE(), GETDATE())
SELECT studentId,studentName,deptId,marks,createDate,updateDate
FROM dbo.Student

Create the Staging and Final Tables in Azure SQL database

I create a staging table named stgAFaculty. This table has three extra columns other than the Faculty table columns. These three columns will be returned from the CHANGETABLE function. Similarly, the staging tables named stgADepartment and stgAStudent are created for storing Department and Student table data respectively.

I drop the tables AStudent, ADepartment and AFaculty in order, in case the tables already exist. AStudent, ADepartment, and AFaculty tables have the foreign keys similar with the on-premise tables named Student, Department, and Faculty respectively. After dropping the tables, I create the tables, AFaculty, ADepartment, and AStudent, in order.

--18. drop and create the Faculty staging table
IF OBJECT_ID('dbo.stgAFaculty','U') IS NOT NULL
DROP TABLE dbo.stgAFaculty
CREATE TABLE dbo.stgAFaculty (
facultyId INT NOT NULL,
facultyName VARCHAR(50),
DOJ Date,
createDate datetime2 NULL,
updateDate datetime2 NULL,
SYS_CHANGE_VERSION BIGINT,
SYS_CHANGE_OPERATION CHAR(5),
SYS_CHANGE_COLUMNS VARBINARY(50)
)
--19. drop and create the Department staging table
IF OBJECT_ID('dbo.stgADepartment','U') IS NOT NULL
DROP TABLE dbo.stgADepartment
CREATE TABLE dbo.stgADepartment (
deptId INT NOT NULL PRIMARY KEY,
deptName VARCHAR(50),
deptHeadId INT NULL, --FOREIGN KEY REFERENCES dbo.AFaculty (facultyId),
createDate datetime2 NULL,
updateDate datetime2 NULL,
SYS_CHANGE_VERSION BIGINT,
SYS_CHANGE_OPERATION CHAR(5),
SYS_CHANGE_COLUMNS VARBINARY(50)
)
--20. drop and create the Student staging table
IF OBJECT_ID('dbo.stgAStudent','U') IS NOT NULL
DROP TABLE dbo.stgAStudent
CREATE TABLE dbo.stgAStudent (
studentId int NOT NULL PRIMARY KEY,
studentName varchar (100) NULL,
deptId INT NULL, --FOREIGN KEY REFERENCES dbo.ADepartment (deptId),
marks int NULL,
createDate datetime2 NULL,
updateDate datetime2 NULL,
SYS_CHANGE_VERSION BIGINT,
SYS_CHANGE_OPERATION CHAR(5),
SYS_CHANGE_COLUMNS VARBINARY(50)
) 
--21. drop the Student table
IF OBJECT_ID('dbo.AStudent','U') IS NOT NULL
DROP TABLE dbo.AStudent
--22. drop the Department table
IF OBJECT_ID('dbo.ADepartment','U') IS NOT NULL
DROP TABLE dbo.ADepartment
--23. drop the Faculty table
IF OBJECT_ID('dbo.AFaculty','U') IS NOT NULL
DROP TABLE dbo.AFaculty
--24. create the Faculty table
CREATE TABLE dbo.AFaculty (
facultyId INT NOT NULL PRIMARY KEY,
facultyName VARCHAR(50),
DOJ Date,
createDate datetime2 NULL,
updateDate datetime2 NULL
)
--25. create the Department table
CREATE TABLE dbo.ADepartment (
deptId INT NOT NULL PRIMARY KEY,
deptName VARCHAR(50),
deptHeadId INT NOT NULL FOREIGN KEY REFERENCES dbo.AFaculty (facultyId),
createDate datetime2 NULL,
updateDate datetime2 NULL
)
--26. create the Student table
CREATE TABLE dbo.AStudent (
studentId int NOT NULL PRIMARY KEY,
studentName varchar (100) NULL,
deptId INT NOT NULL FOREIGN KEY REFERENCES dbo.ADepartment (deptId),
marks int NULL,
createDate datetime2 NULL,
updateDate datetime2 NULL
)

Integration Runtime

In the Azure portal, I create a Data Factory named 'adf-multi-table'. I go to the Manage tab and create a self-hosted Integration Runtime, named selfhostedIR1-sd. This Integration Runtime is required for movement of data from an on-premise SQL Server to an Azure SQL Database.

The Azure Integration Runtime, named AutoResolveIntegrationRuntime, was already available and is required to copy data between cloud data stores.

Linked Services

I create two linked services as detailed below:

  • AzureSQLDatabase1 for Azure SQL Database
  • SQLServer1 for SQL Server (on-premise)

Datasets 

I create three datasets for the Azure SQL database tables and four other datasets for the on-premise SQL database tables. The dataset details are as given below:

  • AzureSqlTable1 is created for dbo.stgAFaculty
  • AzureSqlTable2 is created for dbo.stgADepartment
  • AzureSqlTable3 is created for dbo.stgAStudent
  • SqlServerTable1 is created for dbo.ChangeTrackingStore
  • SqlServerTable2 is created for dbo.Department
  • SqlServerTable3 is created for dbo.Faculty
  • SqlServerTable4 is created for dbo.Student

Create a Pipeline

I go to the Author tab of the ADF resource and create a new pipeline. I name it pipeline_incrload_change_tracking.

Lookup Activities

I create the first Lookup activity, named lkupLastChngTrackVer. The source dataset is set as SqlServerTable1, pointing to the dbo.ChangeTrackingStore table. This activity returns the three rows of the table that contain the change tracking version value for the three tables. All the changes made in the three source tables after this version will be considered for loading (insert, update, delete) in the three target tables in Azure SQL database.

I assume here that before this version, the three source tables were empty.

I create the second Lookup activity, named lkupCurChngTrackVer.

I write the following query to retrieve  the change tracking version associated with the last committed transaction of the on-premise SQL database. In this task, the  'First Row Only' checkbox is selected.

SELECT CHANGE_TRACKING_CURRENT_VERSION() as CurrentChangeTrackingVersion

I will consider tracking the changes in the source tables up to this version. Output of the two lookup activities helps to decide from which change tracking version to which version the changes to be considered for data loading at each iteration of the ADF execution.

Copy Activity for Faculty Staging table

I create the Copy data activity named CopyToStgAFaculty and add the output links from the two lookup activities as input to the Copy data activity. In the source tab, the source dataset is set to SqlServerTable3.

I write the following query to retrieve all the records from the Faculty table with the change operation and the change version details after performing a right outer join with CHANGETABLE. Extra records from CHANGETABLE are also retrieved for deleted records from the Faculty table.

SELECT 
c.facultyId,
f.facultyName,
f.DOJ,
f.createDate,
f.updateDate,
c.SYS_CHANGE_VERSION, 
c.SYS_CHANGE_OPERATION,  
c.SYS_CHANGE_COLUMNS
FROM Faculty AS f
RIGHT OUTER JOIN CHANGETABLE(
CHANGES Faculty, @{activity('lkupLastChngTrackVer').output.value[2].SYS_CHANGE_VERSION}) as c 
ON f.[facultyId] = c.[facultyId] 
where c.SYS_CHANGE_VERSION <= 
@{activity('lkupCurChngTrackVer').output.firstRow.CurrentChangeTrackingVersion}

In the sink tab, I select AzureSqlTable1 as the sink dataset. I write the pre copy script to truncate the staging table stgAFaculty every time before data loading.

TRUNCATE TABLE [dbo].[stgAFaculty]

Copy Activity for Department Staging table

I create the Copy data activity named CopyToStgADepartment next to the CopyToStgAFaculty activity. In the source tab, the source dataset is set to SqlServerTable2.

I write the following query to retrieve all the records from the Department table with the change operation and the change version details after performing a right outer join with CHANGETABLE. Extra records from CHANGETABLE are also retrieved for deleted records from the Department table.

SELECT 
c.deptId,
d.deptName,
d.deptHeadId,
d.createDate,
d.updateDate,
c.SYS_CHANGE_VERSION, 
c.SYS_CHANGE_OPERATION,  
c.SYS_CHANGE_COLUMNS
FROM Department AS d
RIGHT OUTER JOIN CHANGETABLE(
CHANGES Department, @{activity('lkupLastChngTrackVer').output.value[1].SYS_CHANGE_VERSION}) as c 
ON d.[deptId] = c.[deptId] 
where c.SYS_CHANGE_VERSION <= 
@{activity('lkupCurChngTrackVer').output.firstRow.CurrentChangeTrackingVersion}

In the sink tab, I select AzureSqlTable2 as the sink dataset. I write the pre copy script to truncate the staging table stgADepartment every time before data loading.

TRUNCATE TABLE [dbo].[stgADepartment]

Copy Activity for Student Staging table

I create the Copy data activity named CopyToStgAStudent next to the CopyToStgADepartment activity. In the source tab, the source dataset is set to SqlServerTable4.

I write the following query to retrieve all the records from the Student table with the change operation and the change version details after performing a right outer join with CHANGETABLE. Extra records from CHANGETABLE are also retrieved for deleted records from the Student table.

SELECT 
c.studentId,
s.studentName,
s.deptId, 
s.marks,
s.createDate,
s.updateDate,
c.SYS_CHANGE_VERSION, 
c.SYS_CHANGE_OPERATION,  
c.SYS_CHANGE_COLUMNS
FROM Student AS s 
RIGHT OUTER JOIN CHANGETABLE(
CHANGES Student, @{activity('lkupLastChngTrackVer').output.value[0].SYS_CHANGE_VERSION}) as c 
ON s.[studentId] = c.[studentId] 
where c.SYS_CHANGE_VERSION <= 
@{activity('lkupCurChngTrackVer').output.firstRow.CurrentChangeTrackingVersion}

In the sink tab, I select AzureSqlTable3 as the sink dataset. I write the pre copy script to truncate the staging table stgAStudent every time before data loading.

TRUNCATE TABLE [dbo].[stgAStudent]

Stored Procedure Activity

I create a Stored Procedure activity next to the Copy Data activity named CopyToStgAStudent. I select the linked service, AzureSqlDatabase1, and the stored procedure, usp_Modify_Tables. The purpose of this stored procedure is to delete the records from the three Azure SQL tables that are already deleted from the source tables after the last data load. Update and insertion of records are also done on the Azure SQL target tables from the staging tables, based on the SYS_CHANGE_OPERATION column value.

The stored procedure code is given below. Records are first deleted from the AStudent, ADepartment, and AFaculty tables in order. Then, records are updated in AFaculty, ADepartment, and AStudent tables in order. Next, new records are inserted in AFaculty, ADepartment, and AStudent tables in order.

IF OBJECT_ID('dbo.usp_Modify_Tables','P') IS NOT NULL
DROP PROCEDURE dbo.usp_Modify_Tables
GO
CREATE PROCEDURE dbo.usp_Modify_Tables
AS
BEGIN
BEGIN TRY
BEGIN TRANSACTION
  
  --delete records from the three tables in order
  DELETE FROM dbo.AStudent
  FROM dbo.AStudent AS S
  INNER JOIN dbo.stgAStudent AS Ct ON S.studentId = Ct.studentId
  WHERE Ct.SYS_CHANGE_OPERATION = 'D'
  DELETE FROM dbo.ADepartment
  FROM dbo.ADepartment AS D
  INNER JOIN dbo.stgADepartment AS Ct ON D.deptId = Ct.deptId
  WHERE Ct.SYS_CHANGE_OPERATION = 'D'
  DELETE FROM dbo.AFaculty
  FROM dbo.AFaculty AS F
  INNER JOIN dbo.stgAFaculty AS Ct ON F.facultyId = Ct.facultyId
  WHERE Ct.SYS_CHANGE_OPERATION = 'D'
  --update records in the three tables in order
  UPDATE dbo.AFaculty
  SET facultyName = Ct.facultyName,
  DOJ = Ct.DOJ,
  createDate = Ct.createDate,
  updateDate = Ct.updateDate
  FROM dbo.AFaculty AS F
  INNER JOIN dbo.stgAFaculty AS Ct ON F.facultyId = Ct.facultyId
  WHERE Ct.SYS_CHANGE_OPERATION = 'U'
  UPDATE dbo.ADepartment
  SET deptName = Ct.deptName,
  deptHeadId = Ct.deptHeadId,
  createDate = Ct.createDate,
  updateDate = Ct.updateDate
  FROM dbo.ADepartment AS D
  INNER JOIN dbo.stgADepartment AS Ct ON D.deptId = Ct.deptId
  WHERE Ct.SYS_CHANGE_OPERATION = 'U'
  UPDATE dbo.AStudent
  SET studentName = Ct.studentName,
  deptId = Ct.deptId,
  marks = Ct.marks,
  createDate = Ct.createDate,
  updateDate = Ct.updateDate
  FROM dbo.AStudent AS S
  INNER JOIN dbo.stgAStudent AS Ct ON S.studentId = Ct.studentId
  WHERE Ct.SYS_CHANGE_OPERATION = 'U'
  --insert records in the three tables in order
  INSERT INTO dbo.AFaculty (facultyId,facultyName,DOJ,createDate,updateDate)
  SELECT facultyId,facultyName,DOJ,createDate,updateDate
  FROM dbo.stgAFaculty
  WHERE SYS_CHANGE_OPERATION = 'I'
  
  INSERT INTO dbo.ADepartment (deptId,deptName,deptHeadId,createDate,updateDate)
  SELECT deptId,deptName,deptHeadId,createDate,updateDate
  FROM dbo.stgADepartment
  WHERE SYS_CHANGE_OPERATION = 'I'
  INSERT INTO dbo.AStudent (studentId, studentName,deptId,marks,createDate,updateDate)
  SELECT studentId, studentName,deptId,marks,createDate,updateDate
  FROM dbo.stgAStudent
  WHERE SYS_CHANGE_OPERATION = 'I'
 
COMMIT TRANSACTION
END TRY
BEGIN CATCH
SELECT 'Error'
ROLLBACK TRANSACTION
RETURN -1
END CATCH
RETURN 0
END
GO

Second Stored Procedure Activity

I create the second Stored Procedure activity, named uspUpdateChangeTrackingStore. I select the linked service, SQLServer1, and stored procedure, uspUpdateChangeTrackingStore. The purpose of this stored procedure is to update the SYS_CHANGE_VERSION column value of  ChangeTrackingStore table with the current value of change tracking version for the three records in the table.

This procedure takes four parameters: changeTrackingVersion, t1, t2, and t3.

IF OBJECT_ID('dbo.uspUpdateChangeTrackingStore','P') IS NOT NULL
DROP PROCEDURE dbo.uspUpdateChangeTrackingStore
GO
CREATE PROCEDURE dbo.uspUpdateChangeTrackingStore @changeTrackingVersion BIGINT, @t1 varchar(255), @t2 varchar(255), @t3 varchar(255)
AS
BEGIN
    UPDATE dbo.ChangeTrackingStore
    SET SYS_CHANGE_VERSION = @changeTrackingVersion
WHERE tableName IN (@t1,@t2, @t3)
END

The parameter details supplied in the Stored Procedure activity are as given below:

changeTrackingVersion int64
-- @{activity('lkupCurChngTrackVer').output.firstRow.CurrentChangeTrackingVersion}
t1 string
-- @{activity('lkupLastChngTrackVer').output.value[0].tableName}
t2 string
-- @{activity('lkupLastChngTrackVer').output.value[1].tableName}
t3 string
-- @{activity('lkupLastChngTrackVer').output.value[2].tableName}

Debug the Pipeline for the Initial Data Load

I publish all the changes and execute the pipeline by pressing the Debug button. I follow the debug progress and the execution of all the activities completed successfully.

Check data in the Target tables

I select data from the Azure SQL tables. These tables now contain all the data as available in the source SQL database tables.

SELECT facultyId,facultyName,DOJ,createDate,updateDate FROM dbo.AFaculty
/*
facultyIdfacultyNameDOJcreateDateupdateDate
1fname12021-01-012021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
2fname22021-01-022021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
3fname32021-01-032021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
4fname42021-01-042021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
*/SELECT deptId,deptName,deptHeadId,createDate,updateDate FROM dbo.ADepartment
/*
deptIddeptNamedeptHeadIdcreateDateupdateDate
1dept112021-09-13 12:35:05.04666672021-09-13 12:35:05.0466667
2dept222021-09-13 12:35:05.04666672021-09-13 12:35:05.0466667
*/SELECT studentId,studentName,deptId,marks,createDate,updateDate FROM dbo.AStudent
/*
studentIdstudentNamedeptIdmarkscreateDateupdateDate
1st11942021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
2st22952021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
3st31922021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
4st42932021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
5st51902021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
6st62892021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
*/

Modify Data in On-Premise source SQL database

Now, I connect to the source SQL database. I delete one record from Student, one record from Department, and one record from Faculty. Before deletion, I need to make sure that the Department record to be deleted should not be referred to in the Student table. I also ensure the Faculty record to be deleted should not be referred to in the Department table.

Next, I update one record in each of the three tables. Then, I insert two new records in each of the three tables.

--delete existing record
DELETE FROM dbo.Student
WHERE studentId = 6
DELETE FROM dbo.Department
WHERE deptId = 2
DELETE FROM dbo.Faculty
WHERE facultyId = 4
--update existing record
UPDATE dbo.Faculty
SET facultyName = 'fname3New'
WHERE facultyId = 3
UPDATE dbo.Department
SET deptName = 'dept1New'
WHERE deptId = 1
UPDATE dbo.Student
SET marks = 70
WHERE studentId = 5

--new records inserted
INSERT INTO dbo.Faculty (facultyName,DOJ,createDate,updateDate)
VALUES
('fname5','01-05-2021',GETDATE(), GETDATE()),
('fname6','01-06-2021',GETDATE(), GETDATE())

INSERT INTO dbo.Department (deptName,deptHeadId,createDate,updateDate)
VALUES
('dept3',5,GETDATE(), GETDATE()),
('dept4',6,GETDATE(), GETDATE())
INSERT INTO dbo.Student (studentName,deptId,marks,createDate,updateDate)
VALUES
('st7',3,60,GETDATE(), GETDATE()),
('st8',4,61,GETDATE(), GETDATE())

Debug the pipeline again for incremental data load

I execute the pipeline again by pressing the Debug button. I follow the debug progress and the execution of all the activities completed successfully.

Check Data the Azure SQL Database

Now, I check data in both the staging tables and final tables in the Azure SQL database. Each staging table has entries for the inserted, updated and deleted records. In case of deleted record, only the Id column has value and rest of the table columns contain NULL. Because, in case of deletion, record does not exist in the table in on-premise SQL Server database, But, the primary key of the deleted record is returned from CHANGETABLE.

The final tables in Azure SQL Database reflect all the changes done in the source tables. So, Azure SQL tables now contain the same set of data as present in the source SQL tables.

SELECT facultyId,facultyName,DOJ,createDate,updateDate,SYS_CHANGE_VERSION,SYS_CHANGE_OPERATION,SYS_CHANGE_COLUMNS 
FROM dbo.stgAFaculty
/*
facultyIdfacultyNameDOJcreateDateupdateDateSYS_CHANGE_VERSIONSYS_CHANGE_OPERATIONSYS_CHANGE_COLUMNS
3fname3New2021-01-032021-09-13 12:34:59.82333332021-09-13 12:34:59.823333331U    0x0000000002000000
4NULLNULLNULLNULL30D    NULL
5fname52021-01-052021-09-13 13:41:04.62000002021-09-13 13:41:04.620000034I    NULL
6fname62021-01-062021-09-13 13:41:04.62000002021-09-13 13:41:04.620000034I    NULL
*/SELECT deptId,deptName,deptHeadId,createDate,updateDate,SYS_CHANGE_VERSION,SYS_CHANGE_OPERATION,SYS_CHANGE_COLUMNS 
FROM dbo.stgADepartment
/*
deptIddeptNamedeptHeadIdcreateDateupdateDateSYS_CHANGE_VERSIONSYS_CHANGE_OPERATIONSYS_CHANGE_COLUMNS
1dept1New12021-09-13 12:35:05.04666672021-09-13 12:35:05.046666732U    0x0000000002000000
2NULLNULLNULLNULL29D    NULL
3dept352021-09-13 13:41:04.62000002021-09-13 13:41:04.620000035I    NULL
4dept462021-09-13 13:41:04.62000002021-09-13 13:41:04.620000035I    NULL
*/SELECT studentId,studentName,deptId,marks,createDate,updateDate,SYS_CHANGE_VERSION,SYS_CHANGE_OPERATION,SYS_CHANGE_COLUMNS 
FROM dbo.stgAStudent
/*
studentIdstudentNamedeptIdmarkscreateDateupdateDateSYS_CHANGE_VERSIONSYS_CHANGE_OPERATIONSYS_CHANGE_COLUMNS
5st51702021-09-13 12:35:10.35000002021-09-13 12:35:10.350000033U    0x0000000004000000
6NULLNULLNULLNULLNULL28D    NULL
7st73602021-09-13 13:41:04.62000002021-09-13 13:41:04.620000036I    NULL
8st84612021-09-13 13:41:04.62000002021-09-13 13:41:04.620000036I    NULL
*/SELECT facultyId,facultyName,DOJ,createDate,updateDate FROM dbo.AFaculty
/*
facultyIdfacultyNameDOJcreateDateupdateDate
1fname12021-01-012021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
2fname22021-01-022021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
3fname3New2021-01-032021-09-13 12:34:59.82333332021-09-13 12:34:59.8233333
5fname52021-01-052021-09-13 13:41:04.62000002021-09-13 13:41:04.6200000
6fname62021-01-062021-09-13 13:41:04.62000002021-09-13 13:41:04.6200000
*/SELECT deptId,deptName,deptHeadId,createDate,updateDate FROM dbo.ADepartment
/*
deptIddeptNamedeptHeadIdcreateDateupdateDate
1dept1New12021-09-13 12:35:05.04666672021-09-13 12:35:05.0466667
3dept352021-09-13 13:41:04.62000002021-09-13 13:41:04.6200000
4dept462021-09-13 13:41:04.62000002021-09-13 13:41:04.6200000
*/SELECT studentId,studentName,deptId,marks,createDate,updateDate FROM dbo.AStudent
/*
studentIdstudentNamedeptIdmarkscreateDateupdateDate
1st11942021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
2st22952021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
3st31922021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
4st42932021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
5st51702021-09-13 12:35:10.35000002021-09-13 12:35:10.3500000
7st73602021-09-13 13:41:04.62000002021-09-13 13:41:04.6200000
8st84612021-09-13 13:41:04.62000002021-09-13 13:41:04.6200000
*/

Conclusion

Initial full load and Incremental data load is now complete for three related tables from the on-premise source SQL database to the target Azure SQL database. More tables can be added in this process, if required. As the tables Student, Department and Faculty are related to each other with Referential Integrity, the incremental data loading together makes the process effective and useful. Unrelated tables also can be modified together using a single Data Factory pipeline.

Rate

4.5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

4.5 (2)

You rated this post out of 5. Change rating