It has been quite a long time since I wrote an article on SQL Server due to other commitments with different clients and other technologies. Now I am back with an article on SQL Server Integration Services.
I had recently completed a data warehousing project for a client using SQL Server 2005. I have decided to share the knowledge "as is" with the SQL Server community. The article is about SQL Server Integration Services (SSIS) automation.
The user requirement is described as follows:
1. Client gets extracts from five different source systems (.txt, .csv and .xls file) every month on different weekdays.
2. Client would save the files in a folder on a SQL Server box
3. SQL Server should perform the Extract, Transfer and Load (ETL) process in an automated fashion for the five different source systems using the Extract -> Staging -> Fact table methodology
4. The process should be a one time setup in SQL Server and should execute every month without manual intervention.
The client message is strong and clear. The ETL process using SSIS should be automated without manual intervention. There is more than one way to accomplish the client requirement, however I have selected the following tools/components to accomplish the task:
- Microsoft SQL Server 2005
- Microsoft Data Access Components
- Microsoft SQL Server Integration Services (SSIS)
Please make sure you have installed Microsoft SQL Server 2005 & Service Pack 2.0, the Microsoft Data Access Components 2.8, the Microsoft .NET Framework 2.0 and the "Microsoft.Office.Interop.Excel.dll" file in a GAC folder before you read the article further.
SQL Server Integration Services
Integration Services is a platform for building high performance data integration and workflow solutions, including extraction, transformation, and loading (ETL) operations for data warehousing.
SQL Server 2000 Data Transformation Services (DTS) have been
replaced with Integration Services in SQL Server 2005. VB.Net is only
scripting language supported in integration services. You can always read more
information on Integration Services at SQL Server 2005 books online.
I have used above mention dll for reading excel file data such as cell value in a worksheet. The Microsoft.Office.Interop.Excel.dll file will be installed along with .NET Framework 2.0 installation otherwise you could download the file in your machine and the follow the instruction to install the DLL in GAC folder.
I decided to create an SSIS package for each source system. I knew I could schedule the SSIS packages using SQL Server Agent. The interesting process would be to execute the primary job, calculate & set the schedule start date for the subsequent month, rename & disable the job on success and rename & enable the next job for execution. The cycle spins for the other jobs until the final job for a month (Figure 1). Let us read further to find out how the "Thought Process" has been executed in SQL Server 2005 using integration services.
I enclosed the "automation" zip folder below that contains dbscript.sql, SSIS package and extract files. You could download and unzip the automation zip folder in your local machine. I had used trusted connections for SSIS package execution. You could use user name & password connection strings for SSIS package execution, but you must change the connection string property in .dtsConfig files.
After downloading and extracting the dbscript.sql file, SSIS package and extract files and making the required changes before you execute the dbscript.sql file in your local SQL Server machine. The steps are explained below:
The code is available at the bottom in the "Resources" section.
1. Create an "automation" folder in your C:\ drive. Save the dbscript.sql file, SSIS package and .config file in the same folder. You must change the SSIS package path in dbscript.sql file and config file path for SSIS package if you save the SSIS package and config files in a different location
2. Save the Extract Files (.txt, .csv and .xls file) in the same folder (automation)
3. Change database primary, secondary and log file path in dbscript.sql file.
4. Change the data source value (Data Source=<SQLServerName>;) to your SQL Server name for the connection string property in .dtsConfig files.
You can execute the dbscript.sql file in your local SQL Server machine after you complete the prerequisites points.
The dbscript.sql script file would create a database, user defined data types, tables, indexes, constraints, foreign keys, base data, stored procedures and jobs. You should have an "FMR" database in your database drop down list box after you execute dbscript.sql successfully. Please make sure your SQL Server Agent is started and running for job execution.
I have created five SSIS package for five source systems. I have scheduled the SSIS package as a job using SQL Server Agent. The jobs have been scheduled to run on every hour on weekdays. You could change the frequency interval at your convenience. The primary (Control Report) job has been enabled by default and rest of jobs is not enabled & rename with suffix as "-- Disabled".
The enabled job would start execute, calculate & set the schedule start date for the subsequent month, rename (suffix as "-- Disabled") & disable the job on success. Rename (remove suffix as "-- Disabled") & enable the next job for execution. The cycle spins for other jobs until the final job for a month.
Per user requirements, I disable the job running on weekends and holidays. The system calculates subsequent month's scheduled dates and excludes weekend & holidays. The job would execute on the next business day if the subsequent month's scheduled date falls on a holiday or a weekend. I had provided the explanation how I had accomplish those challenges below.
Weekends dates have been calculated & identified using DATEADD & DATEPART SQL Server system functions. I used a Holiday look up table to find the list of client holidays for the current year. The OthCalculateNextDate & ConCheckHoliday stored procedure has the weekend & holiday calculation and business rules.
The stored procedure would return 0/1 value for a provided date parameter (@Hol_Date). The row might not exist or does not match a value in the holiday master table for the provided date and the procedure returns a 0 value. Otherwise the stored procedure returns a 1 value for the provided date row if it exists in the holiday master table.
----------------------------------------------------------------------------------- -- Drop Procedure ----------------------------------------------------------------------------------- if exists (select * from sysobjects where id = object_id(N'[dbo].[ConCheckHoliday]') and OBJECTPROPERTY(id, N'IsProcedure') = 1) drop procedure [dbo].[ConCheckHoliday] GO CREATE PROCEDURE ConCheckHoliday ( @SessionId [udtNumber], @FunctionName [udtName], @UserName [udtName], @Hol_Date [udtDateTime], @IsHoliday [udtNumber] output ) AS declare @ret int set nocount on set @IsHoliday = 0 set @ret = 0 if exists(select CMH_PK_ID from CNFG_MSTR_HOL where CMH_DT = @Hol_Date) begin Set @IsHoliday = 1 end IF @@error != 0 BEGIN exec OthInsErrorMessage @SessionId, @FunctionName, 52203 , 11, 'CNFG_MSTR_HOL: select Statement failed' return 52203 END exec @ret = OthInsUserMessage @SessionId, @FunctionName = @FunctionName, @Task_Name = 'Select Statement: CNFG_MSTR_HOL ', @Status = 'Complete', @UserName = @UserName IF @ret != 0 BEGIN exec OthInsErrorMessage @SessionId, @FunctionName ,50200, 11, 'Whilst writing select statement CNFG_MSTR_HOL complete statement' return 50200 END set nocount off return 0 GO
The stored procedure adds a month for a provided date parameter (@FromDate) using DATEADD functions and extracts the weekday using the DATEPART function. The extracted weekday should be between M-F and the weekday should not be a holiday. Otherwise the loop continues until the criteria matches. The holiday business rule and stored procedure have been explained earlier in the article.
------------------------------------------------------------------------------- -- Drop Procedure ------------------------------------------------------------------------------- if exists (select * from sysobjects where id = object_id(N'[dbo].[OthCalculateNextDate]') and OBJECTPROPERTY(id, N'IsProcedure') = 1) drop procedure [dbo].[OthCalculateNextDate] GO CREATE PROCEDURE OthCalculateNextDate ( @SessionId [udtNumber], @FunctionName [udtName], @UserName udtName, @FromDate [udtDateTime] Output ) AS declare @ret int, @DateFirst int, @intDay int, @found bit, @IsHoliday int, @NextDate [udtDateTime] set nocount on -- take the definition of the first day of the week set @DateFirst = @@DATEFIRST set @found = 0 set @IsHoliday = 0 set @ret = 0 -- set the first day of the week to Sunday set DATEFIRST 7 begin set @NextDate = dateadd(month, 1,@FromDate) --While loop make sure "NextDate" should be on a week date instead of weekend. --SSIS Package should not run on weekends and holidays. while (@found = 0) begin set @intDay = datepart(weekday, @NextDate) if @intDay in (2, 3, 4, 5,6) begin exec @ret = ConCheckHoliday @SessionId, @FunctionName = @FunctionName, @UserName = @UserName, @Hol_Date = @NextDate, @IsHoliday = @IsHoliday Output IF @ret != 0 BEGIN exec OthInsErrorMessage @SessionId, @FunctionName, 52203, 11 , 'Holiday table: select statement did not succeed' return 52203 END IF @IsHoliday = 0 break end set @NextDate = dateadd(day, 1, @NextDate) end set @FromDate = @NextDate end set nocount off set DATEFIRST @DateFirst return 0 GO
You can read the code to understand the stored procedures. I provided comments in those store procedures to help make it self explanatory.
Figure 1 FMR Automation
SQL Server stores SSIS package and SQL Agent job information in the msdb database. I had used couple of msdb database system stored procedures to accomplish the automation process. I provided the procedure names (msdb.dbo.sp_update_job & msdb.dbo.sp_update_schedule) and parametesr below for your information. Both stored procedure have many parameters, however I had used only a few parameters relevant to accomplish the task. SQL Server Books Online explains the syntax & use of msdb.dbo.sp_update_job & msdb.dbo.sp_update_schedule procedure in detail.
I used the enclosed stored procedures, OthSetCurrentAndNextSSIS, to rename & disable the current job and rename & enable the next job. I had explained the parameters that I had used for those system stored procedures.
msdb.dbo.sp_update_job changes the attributes of a job and changes only those settings for which parameter values are supplied. If a parameter is omitted, the current setting is retained.
The parameters are designed a follows for msdb.dbo.sp_update_job
- @job_name: The name of the job.
- @new_name: The new name for the job.
- @enabled: Specifies whether the job is enabled (1) or not enabled (0).
msdb.dbo.sp_update_schedule changes the settings for a SQL Server Agent schedule and changes only those settings for which parameter values are supplied. If a parameter is omitted, the current setting is retained.
The parameters are designed a follows for msdb.dbo.sp_update_schedule
- @name - The name of the schedule to modify. schedule_name is sysname, with no default. Either schedule_id or schedule_name must be specified.
- @new_name - The new name for the schedule. new_name is sysname, with a default of NULL. When new_name is NULL, the name of the schedule is unchanged.
- @enabled - Indicates the current status of the schedule. enabled is tinyint, with a default of 1 (enabled). If 0, the schedule is not enabled. When the schedule is not enabled, no jobs will run on this schedule.
- @active_end_date - The date on which execution of a job can stop. active_end_date is int, with a default of 99991231, which indicates December 31, 9999. Formatted as YYYYMMDD.
You can also verify the job next month schedule date by selecting & right click the job properties on sql server agent job folder. The start date on duration section on job schedule properties on schedules page would show the selected job subsequent month schedule date.
The customer taste changes as life moves on. The client wants to spend & use their time to enrich their work & life instead of allocating more than one dedicated resource to run this process every month. It is nice to provide what customer needs using current technologies available on the market. The information technology field is growing fast with innovative methods and a technical expert might enhance this automation in a much better way in later years to accomplish the customer might need at that time.