ETL can be a very time consuming and complicated process. In order that the business does not come to a standstill when a critical, and time consuming, ETL process encounters a failure, the package is validated both during design time and during execution time.
In this article we'll gain a brief understanding of a useful property Delay Validation is, and how this property can be manipulated to favor the business case under special circumstances.
Design Time Validation
To understand more about design time validation, let’s create a package.
Here I have two Execute SQL Tasks on my design pane. As you see in the graphic above, the one on the right has a small cross. That’s because the package was validated during design time to list potential agents that might break the routine execution of the package.
In this case, the error discovered was that the task has no connection manager defined. This alerts the designer that a valid connection string should be defined for task execution. Such early signals are highly beneficial as they serve as an important indicator that something somewhere is incorrect and requires the attention of the designer.
The default value of delay validation is false. This means that all actions taking place on the design pane will be validated and checked for their correctness. Here, we skipped the connection string and the development studio is smart enough to point that out to the designer.
Once valid data is provided, the cross mark disappears, thereby ensuring that the task will not fail as far as the connection to server is concerned.
Please note that during package execution, errors specific to Execute SQL Task, like the Timeout error, are not managed by the delay validation property. The designer will have to tweak the task accordingly. Additionally, not specifying a connection string, and setting the value for delay validation to true, will not make any sense until the designer decides to use expressions to specify connection details.
Run Time Validation
To understand more about run time validation, let’s create a package. Here, let's create a business case to archive files (you may consider the daily ETL loads) and move them into say a "processed" folder after successful transfer to a staging area.
Suppose that the daily ETL loads the business receives from the transaction system are in the form of flat files, and that each file has a unique time stamp attached, like (filename_date.txt). Now since the source is a varying entity, the name to source will only be resolved at run time i.e. when the package is executed. So this can be a brilliant case to study the use of delay validation at run time.
Here, let us design a SSIS package that has
- a Foreach loop container(FEL)
- a File system task(FST) and a user variable having type string to store current file names
The Foreach loop will allow repeatable execution over multiple files and the file system task will allow to move the processed file to some alternate location, let's call this location as "processed folder".
Here, the FEL is configured in file enumeration mode and the user variable will store the name and extension of the current file, which the executable processes. Such a design makes the source file for the FST variable whereas the destination remains unchanged - path to "processed" folder. When you return back to the designer window, you will notice a small cross mark on the File System Task. This is not because, the task was not configured in an appropriate manner, but the user variable created which stores the source path is empty.
Now, if you toggle the default value of the property Delay Validation and set it to true, the validation will be delayed and the package will not be validated at design time. This is precisely what we require to meet the design requirements, since the the user variable which stores the file name will get it's value from the file which the FEL executable is currently processing at run time. This is the dynamic value assignment of a user variable.
Thanks | Raunak Jhawar | t:@raunakjhawar