SQLServerCentral Article

SSIS Package documentor

,

Creating an SSIS Package is easy, with a lot of references available to help you. However, when it comes to reading, or let me say documenting, the contents of the package, it is a nightmare. This is especially true when you have many packages to review. I have to deal with a similar requirement and created a utility to read the package code. This article will help to use the script created and add any other enhancements as per requirement.

An SSIS package is made up of XML code structured from a specific XSD. We can read this code using SQL Server XQuery. Let me show an example here. We can fetch all the task name details used in the package by querying the executables path before the object name.

value('declare namespace p1="www.microsoft.com/SqlServer/Dts";//DTS:Executable/DTS:Executables/@p1:ObjectName[1]', 'nvarchar(max)')

To query the XML code of the package, we have to load the complete package XML into a table, which can be done using cmdshell command. First of all load all the package names with their path into a temporary table named "pkgStats".

SELECT @CommandLine = LEFT('dir "' + @Path + '" /A-D /B /S ', 8000);
INSERT INTO @MyFiles (FullPath)
   EXECUTE xp_cmdshell @CommandLine;
DECLARE @sql NVARCHAR(max);
SET @sql = '
INSERT  INTO pkgStats (PackagePath,PackageXML)
               select  ''@FullPath'' as PackagePath
               ,cast(BulkColumn as XML) as PackageXML
               from    openrowset(bulk ''@FullPath'',single_blob) as pkgColumn';
SELECT @sql = REPLACE(@sql, '@FullPath', @FullPath)
EXEC sp_executesql @sql;

An SSIS package is having different tasks and elements categorized into major parts. The script will divide the different objects into five major parts:

  1. Connection objects
  2. Variables
  3. Parameters
  4. Control flow tasks
  5. Data flow tasks

The XSD representation of an SSIS package has different nodes for most of the tasks. The script will use XQuery to query all the nodes of different tasks which will fall in one of the major part and store in one temporary table named "PackageAnalysis". In addition to the detail of the tasks

The connections are fetched using the following query:

PRINT '---------------Get all connections-----------------------------------';
WITH CTE_PkgLevel
AS (
SELECT pkg.RowID
,pkg.PackagePath
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'varchar(max)') AS ConnectionManagerName
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:CreationName[1]', 'varchar(max)') AS ConnectionManagerType
--, cfnodes1.y.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ConnectionString[1]', 'varchar(max)') 
,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:ConnectionManager/p1:ObjectData/p1:ConnectionManager') AS ConnectionStringQry
,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:ConnectionManager/p1:PropertyExpression') AS ExpressionQry
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DTSID[1]', 'varchar(100)') AS ConnectionManagerID
,pkg.PackageCreatorName

FROM pkgStats pkg
CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";(//DTS:Executable/DTS:ConnectionManagers/*)') AS cfnodes(x)
)
,CTE_PkgConLevel
AS (
SELECT pkg.RowID
,pkg.PackagePath
,pkg.ConnectionManagerName
,pkg.ConnectionManagerType
,ConnectionManagerID
--,pkg.ConnectionStringQry
,(
SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:ConnectionString[1]', 'nvarchar(max)') AS ConnectionString
FROM pkg.ConnectionStringQry.nodes('./*') AS cfnodes(x)
) AS ConnectionString
,(
SELECT TOP 1 cfnodes.x.value('.', 'nvarchar(max)') AS ExpressionValue
FROM pkg.ExpressionQry.nodes('./*') AS cfnodes(x)
) AS ExpressionValue
,PackageCreatorName
,(
SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:Retain[1]', 'nvarchar(max)') AS RetainSameConnectionProperty
FROM pkg.ConnectionStringQry.nodes('./*') AS cfnodes(x)
) AS RetainSameConnectionProperty
FROM CTE_PkgLevel pkg
)
INSERT INTO PackageAnalysis (
RowID
,PackagePath
,PackageName
,Category
,PackageCreatorName
,ObjectName
,ObjectType
,DelayValidationPropertyValue
,ObjectValue
,ExpressionValue
,ConnectionManagerID
,RetainSameConnectionProperty
)
SELECT DISTINCT pkg.RowID
,pkg.PackagePath
,REPLACE(pkg.PackagePath, REPLACE(@Path, '*.dtsx', ''), '') PackageName
,'Connection' AS CategoryC
,pkg.PackageCreatorName
,cast(pkg.ConnectionManagerName AS VARCHAR(max)) AS ObjectName
,cast(pkg.ConnectionManagerType AS VARCHAR(max)) AS ObjectType
,'NA' AS DelayValidationPropertyValue
,cast(pkg.ConnectionString AS VARCHAR(max)) AS ObjectValue
,cast(ExpressionValue AS VARCHAR(max)) AS ExpressionValue
,ConnectionManagerID
,RetainSameConnectionProperty
FROM CTE_PkgConLevel pkg;

The variables are fetched using the following query:

PRINT '------------------------Get all variable details-------------------------------------';
WITH CTE_PkgLevel
AS (
SELECT pkg.RowID
,pkg.PackagePath
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'varchar(max)') AS VariableName
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DTSID[1]', 'varchar(max)') AS VariableHexValue
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Expression[1]', 'varchar(max)') AS ExpressionValue
,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:Variable/p1:VariableValue') AS VariableQry
,PackageCreatorName
FROM pkgStats pkg
CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";(//DTS:Executable/DTS:Variables/*)') AS cfnodes(x)
)
--select * from CTE_PkgLevel
,CTE_PkgVarLevel
AS (
SELECT pkg.RowID
,pkg.PackagePath
,pkg.VariableName
,pkg.ExpressionValue
,VariableHexValue
,(
SELECT TOP 1 cfnodes.x.value('.', 'nvarchar(max)') AS VariableValue
FROM pkg.VariableQry.nodes('./*') AS cfnodes(x)
) AS VariableValue
--,pkg.ConnectionStringQry
,(
SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:DataType[1]', 'nvarchar(max)') AS DataType
FROM pkg.VariableQry.nodes('./*') AS cfnodes(x)
) AS VariableDataType
,PackageCreatorName
FROM CTE_PkgLevel pkg
)
INSERT INTO TblVariableDetails (
RowID
,PackagePath
,PackageName
,Category
,PackageCreatorName
,ObjectName
,ObjectType
,ObjectValue
,ExpressionValue
,VariableHexValue
)
SELECT DISTINCT Tblvar.RowID
,Tblvar.PackagePath
,REPLACE(Tblvar.PackagePath, REPLACE(@Path, '*.dtsx', ''), '') PackageName
,'Variable' AS Category
,Tblvar.PackageCreatorName
,cast(Tblvar.VariableName AS VARCHAR(max)) AS TaskName
,cast(Tblvar.VariableDataType AS VARCHAR(max)) AS TaskType
,isnull(cast(Tblvar.VariableValue AS VARCHAR(max)), '') AS VariableValue
,isnull(cast(tblvar.ExpressionValue AS VARCHAR(max)), '') ExpressionValue
,VariableHexValue
FROM CTE_PkgVarLevel Tblvar;

The parameters are fetched using the following query:

PRINT '------------------------Get all parameter details-------------------------------------';
WITH CTE_PkgLevel
AS (
SELECT pkg.RowID
,pkg.PackagePath
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'varchar(max)') AS ParameterName
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DTSID[1]', 'varchar(max)') AS ParameterHexValue
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Expression[1]', 'varchar(max)') AS ExpressionValue
,cfnodes.x.query('.').query('declare namespace p1="www.microsoft.com/SqlServer/Dts";./p1:PackageParameter/p1:Property') AS ParameterQry
,PackageCreatorName
FROM pkgStats pkg
CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";(//DTS:Executable/DTS:PackageParameters/*)') AS cfnodes(x)
)
--select * from CTE_PkgLevel
,CTE_PkgVarLevel
AS (
SELECT pkg.RowID
,pkg.PackagePath
,pkg.ParameterName
,pkg.ExpressionValue
,ParameterHexValue
,(
SELECT TOP 1 cfnodes.x.value('.', 'nvarchar(max)') AS ParameterValue
FROM pkg.ParameterQry.nodes('./*') AS cfnodes(x)
) AS ParameterValue
--,pkg.ConnectionStringQry
,(
SELECT TOP 1 cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";@p1:DataType[1]', 'nvarchar(max)') AS DataType
FROM pkg.ParameterQry.nodes('./*') AS cfnodes(x)
) AS ParameterDataType
,PackageCreatorName
FROM CTE_PkgLevel pkg
)
INSERT INTO tblParameterDetails(
RowID
,PackagePath
,PackageName
,Category
,PackageCreatorName
,ObjectName
,ObjectType
,ObjectValue
,ExpressionValue
,ParameterHexValue
)
SELECT DISTINCT Tblvar.RowID
,Tblvar.PackagePath
,REPLACE(Tblvar.PackagePath, REPLACE(@Path, '*.dtsx', ''), '') PackageName
,'Variable' AS Category
,Tblvar.PackageCreatorName
,cast(Tblvar.ParameterName AS VARCHAR(max)) AS TaskName
,cast(Tblvar.ParameterDataType AS VARCHAR(max)) AS TaskType
,isnull(cast(Tblvar.ParameterValue AS VARCHAR(max)), '') AS ParameterValue
,isnull(cast(tblvar.ExpressionValue AS VARCHAR(max)), '') ExpressionValue
,ParameterHexValue
FROM CTE_PkgVarLevel Tblvar;

The following XQueries are used to fetch Control Flow and Data flow task details:

WITH CTE_CFLevel
AS (
SELECT pkg.RowID
,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
                    ./DTS:Executable[@DTS:ExecutableType=''SSIS.Pipeline.3'']/DTS:ObjectData/pipeline/components/component') DFTQuery
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:refId[1]', 'varchar(max)') AS TaskPath
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ObjectName[1]', 'nvarchar(max)') AS TaskName
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Description', 'nvarchar(max)') AS TaskTypeDescription
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:ExecutableType', 'nvarchar(max)') AS TaskType
,cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:DelayValidation', 'nvarchar(max)') AS DelayValidationPropertyValue
,ISNULL(cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:Disabled', 'nvarchar(max)'), 'False') AS IsDisabled
,'EvalExpression = ' + cfnodes.x.value('declare namespace p1="www.microsoft.com/SqlServer/Dts";./@p1:EvalExpression', 'nvarchar(max)') AS ForloopEvalExpression
,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
                    ./DTS:Executable[@DTS:ExecutableType=''Microsoft.SqlServer.Dts.Tasks.ExecuteSQLTask.ExecuteSQLTask, Microsoft.SqlServer.SQLTask, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'']/DTS:ObjectData/*') AS SqlTaskQry
,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
                    ./DTS:Executable[@DTS:ExecutableType=''SSIS.ExecutePackageTask.3'']/*') ExecPkgTaskQry
,cfnodes.x.query('.').query('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
                    ./DTS:Executable[@DTS:ExecutableType=''Microsoft.SqlServer.Dts.Tasks.ScriptTask.ScriptTask, Microsoft.SqlServer.ScriptTask, Version=11.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91'']/DTS:ObjectData/ScriptProject/*') ScriptTaskQry
FROM pkgStats pkg
CROSS APPLY pkg.PackageXML.nodes('declare namespace DTS="www.microsoft.com/SqlServer/Dts";//DTS:Executable/DTS:Executables/*') AS cfnodes(x)
)
INSERT INTO TblControlFlowDetails (
RowID
,TaskPath
,TaskName
,TaskTypeDescription
,TaskType
,DelayValidationPropertyValue
,DFTQuery
,SqlTaskQry
,ExecPkgTaskQry
,ScriptTaskQry
,IsDisabled
,ExpressionValue
)
SELECT RowID
,TaskPath
,TaskName
,TaskTypeDescription
,TaskType
,DelayValidationPropertyValue
,DFTQuery
,SqlTaskQry
,ExecPkgTaskQry
,ScriptTaskQry
,IsDisabled
,ForloopEvalExpression
FROM CTE_CFLevel;
PRINT '---------- Insert Script task details-----------------------';
INSERT INTO TblScriptTaskdetails (
RowID
,ControlFlowDetailsRowID
,Script
)
SELECT RowID
,CF.ControlFlowDetailsRowID
,cfnodes1.x.value('./ProjectItem[@Name=''ScriptMain.cs''][1]', 'varchar(max)') Script
FROM TblControlFlowDetails cf
CROSS APPLY Cf.ScriptTaskQry.nodes('.') AS cfnodes1(x);
PRINT '---------- Insert Execute package task details-----------------------';
INSERT INTO TblExecutePackageTaskDetails (
RowID
,ControlFlowDetailsRowID
,ExecutePackageExpression
,ExecutedPackageName
,ExecutePackageConnection
)
SELECT RowID
,cf.ControlFlowDetailsRowID
,cfnodes1.x.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
                    ./DTS:PropertyExpression[1]', 'varchar(1000)') ExecutePackageExpression
,cfnodes1.x.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
            (./DTS:ObjectData/ExecutePackageTask/PackageName)[1]', 'varchar(1000)') ExecutedPackageName
,cfnodes1.x.value('declare namespace DTS="www.microsoft.com/SqlServer/Dts";
            (./DTS:ObjectData/ExecutePackageTask/Connection)[1]', 'varchar(1000)') ExecutePackageConnection
FROM TblControlFlowDetails cf
CROSS APPLY Cf.ExecPkgTaskQry.nodes('.') AS cfnodes1(x);
PRINT '---------- Insert DFT details-----------------------';
INSERT INTO TblDFTTaskDetails (
RowID
,DFTTasksPath
,DFTTaskName
,DFTTaskType
,DFTTaskType1
,DFTRowSet
,ParameterBindingParameterName
,DFTSQLCommand
,DFTConnectionManager
,Variable
,IsSortedProperty
,InputQry
,OutputQry
,MultihashcolumnSortPosition
)
SELECT RowID
,dftnodes.x.value('@refId[1]', 'varchar(max)') AS DFTTasksPath
,dftnodes.x.value('@name[1]', 'varchar(max)') AS DFTTaskName
,dftnodes.x.value('@description[1]', 'varchar(max)') AS DFTTaskType
,dftnodes.x.value('@contactInfo[1]', 'varchar(max)') AS DFTTaskType1
,dftnodes.x.value('data(./properties/property[@name=''OpenRowset''])[1]', 'varchar(max)') DFTRowSet
,dftnodes.x.value('data(./properties/property[@name=''ParameterMapping''])[1]', 'varchar(max)') ParameterBindingParameterName
,CASE 
WHEN isnull(dftnodes.x.value('data(./properties/property[@name=''SqlCommand''])[1]', 'varchar(max)'), '') = ''
THEN isnull(dftnodes.x.value('data(./properties/property[@name=''SqlCommandVariable''])[1]', 'varchar(max)'), '')
ELSE isnull(dftnodes.x.value('data(./properties/property[@name=''SqlCommand''])[1]', 'varchar(max)'), '')
END DFTSQLCommand
,dftnodes.x.value('data(./connections/connection/@connectionManagerID)[1]', 'varchar(max)') DFTConnectionManager
,dftnodes.x.value('data(./properties/property[@name=''VariableName''])[1]', 'varchar(max)') Variable
,dftnodes.x.value('data(./outputs/output/@isSorted)[1]', 'varchar(10)') IsSortedProperty
,lineage.x.query('.') InputQry
,outputvalue.x.query('.') OutputQry
,outputvalue.x.value('data(./properties/property[@name=''InputColumnLineageIDs''])[1]', 'varchar(max)')  as MultihashcolumnSortPosition
FROM TblControlFlowDetails pkglvl
CROSS APPLY pkglvl.DFTQuery.nodes('./*') AS dftnodes(x)
OUTER APPLY dftnodes.x.nodes('./inputs/input/inputColumns/*') AS lineage(x)
OUTER APPLY dftnodes.x.nodes('./outputs/output/outputColumns/*') AS outputvalue(x);
PRINT '---------- Insert DFT details-----------------------';
PRINT '---------- Insert DFT Source and destination details-----------------------';
INSERT INTO TblSrcDestDetails (
RowID
,DataFlowDetailsRowID
,SourceColumn
,DestinationColumn
,SortKeyPosition
)
SELECT RowID
,DataFlowDetailsRowID
,lineage.x.value('./@lineageId[1]', 'varchar(max)') AS SourceColumn
,COALESCE(lineage.x.value('./@name[1]', 'varchar(max)'),lineage.x.value('./@externalMetadataColumnId[1]', 'varchar(max)'), lineage.x.value('./@refId[1]', 'varchar(max)')) DestinationColumn
,lineage.x.value('./@sortKeyPosition[1]', 'varchar(200)') SortKeyPosition

FROM TblDFTTaskDetails pkglvl
CROSS APPLY pkglvl.InputQry.nodes('./*') AS lineage(x);
INSERT INTO TblSrcDestDetails (
RowID
,DataFlowDetailsRowID
,SourceColumn
,DestinationColumn
,SortKeyPosition
)
SELECT RowID
,DataFlowDetailsRowID
,lineage.x.value('./@lineageId[1]', 'varchar(max)') SourceColumn
,COALESCE(lineage.x.value('./@name[1]', 'varchar(max)'),lineage.x.value('./@externalMetadataColumnId[1]', 'varchar(max)'), lineage.x.value('./@refId[1]', 'varchar(max)')) DestinationColumn
,lineage.x.value('./@sortKeyPosition[1]', 'varchar(200)') SortKeyPosition
FROM TblDFTTaskDetails pkglvl
CROSS APPLY pkglvl.OutputQry.nodes('./*') AS lineage(x);
PRINT '---------- Insert DFT Source and destination details-----------------------';

The attached stored procedure script (see the Resources section) queries the complete package XML and generates an output report. The procedure accepts the input path for the package files and iterates over the files to document the package code. The code uses a connection to a SQL Server database, which creates temporary tables for the processing.

The output of the procedure returns the complete package details in the form of a grid report.

Report Column Headers

Description

Row ID, Table Row ID

These are all columns used to sequence the report output.

RowID is unique for a single package. TableRowID is an incremental sequence number for all the rows of a  package.

Package Path, Package Name

Package Name and Package Path on the drive.

Category

Describes Package categories into four types - Connection, Variable, Control Flow Task, Data Flow Task.

Task Path

Path of the Task within a package.

Object Name, Object Type

Object Name is Name of the task or object as given in package. Object Type is type of task, transformation or variable data type.

ObjectValue

This is the most critical column showing the value of the Object such as queries, connections.

Connection Manager

Describes the connection details used inside any task or transformation.

Source Column, Destination Column

Gives the source and destination column name used inside all the Data flow transformations.

Derived Value

Gives the derived column transformation expressions used in Derived Column Transformation.

Is Disabled

Shows True / False for all the Control flow tasks if the task is disabled.

Parameter Binding Sequence, Parameter Binding Parameter Name,

Result Set Parameter Name

Describes all the property values used in Execute SQL Task or OLEDB Source.

Execute Package Expression, Execute Package Name

Shows the name or the execute package expression used in Execute Package Task.

Script

Shows the script written in a Script Task.

Lookup Joins

Details of Lookup Joins used in Lookup Transformation.

Sample Output

I have created a sample SSIS package with a Data Flow Task with a Source and Destination.

Here is the Data Flow tab with a simple data transformation.

After executing the procedure with the package path, it returns the following output. There are a number of columns returned, so the output is broken into the following two images.

We can copy the output from SQL Server Management Studio into Excel and use a Pivot Report to document the details of the package. Here is the Sample package Pivot report.

Summary

This post and the sql procedure to document the SSIS package can be helpful to document multiple SSIS package at the same time. This can help to do Code Review and Impact analysis for any change in the SSIS package by verifying the following:

  1. Task names consistency
  2. Confirming the Variable expressions used in the package
  3. Delay Validation property check
  4. Disable objects
  5. Package Protection Level propery
  6. Source queries
  7. Connection Manager setup with exprerssions
  8. Package Creator names
  9. Validation of the precedence constraint to connect two tasks

We can enhance the procedure functionality based on further requirements. Please do let me know the feedback on the code documentor.

Resources

Rate

4.73 (55)

You rated this post out of 5. Change rating

Share

Share

Rate

4.73 (55)

You rated this post out of 5. Change rating