This post is part of a series on this blog that will help me, and hopefully you, pass exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012. So far, we’ve covered:
- Tables and Schemas in the data warehouse
- Dimensions and Slowly Changing Dimensions
- Fact tables and measures
- Intro to columnstore indexes
- Columnstore indexes and partitioning
- Introduction to SQL Server Data Tools (SSDT)
- Connection Managers
I hope you had a relaxing holiday and you’re ready to get back to it! Today I’ll introduce to two components that will make up the heart of your SSIS packages: control flow tasks and containers.
Control Flow Tasks
An SSIS process is really just a set of operations that can be executed in a automated fashion. These operations are implemented as tasks in the SSIS environment. Each task is a single unit of work, and can be combined with other tasks to facilitate a business process. Let’s take a look at the different tasks that can be implemented in SSIS and the role that each one plays.
Data Preparation Tasks
Data preparation tasks help prepare data sources for subsequent processing within the SSIS environment or manage those sources once processing is complete.
- Data Profiling task – useful for determining data quality or discovery when dealing with unfamiliar data sources.
- File System task – includes file and folder operations like copying or moving, deleting or creating, renaming, and setting file/folder attributes. You might use this task for moving a file into an archive folder after the file has been processed.
- FTP task – uses FTP operations to retrieve or send files from/to remote file stores. If your data files come from external sources, this task can retrieve them from that remote location to an internal share.
- Web Service task – invokes web service methods and receives the results. Those results can be stored in a variable for use by other tasks.
- XML task – manipulates XML files and XML data
Data Movement Tasks
As you might expect, data movement tasks facilitate the movement of data.
- Bulk Insert task – load data from formatted text files into SQL Server. No transformations are possible using this task, so the data will be loaded as-is. However, this is a very efficient way to get data out of external files and into a table for subsequent processing.
- Data Flow task – allows complete ETL processing of your data, allowing for in-memory data transformations as you migrate data from source to destination. This is likely to be the task you’ll use most often in SSIS.
- Execute SQL task – executes SQL statements or store procedures against a data source. You can also use parameters in your statements, to allow for more dynamic processing.
Workflow tasks automate interaction between SSIS processes or between SSIS and external processes.
- CDC Control task (new in SQL Server 2012) – helps facilitate incremental loading of the data warehouse based on Change Data Capture (CDC) functionality
- Execute Package task – used to execute other SSIS packages, which helps to modularize your logic and improve reusability
- Execute Process task – allows you to execute processes external to SSIS and SQL Server
- Expression task (new in SQL Server 2012) – used to process variables and assign values to other variables
- Message Queue task – allows SSIS to send and receive messages from Microsoft Message Queuing (MSMQ) queues
- Send Mail task – send emails via SMTP
- WMI Data Reader task – get information about the runtime environment via Windows Management Instrumentation (WMI)
- WMI Event Watcher task – allows you to use events in the environment to control the execution and behavior of SSIS processes. For example, a new data file being placed in a directory can trigger an SSIS package to load that data into SQL Server.
SQL Server Administration Tasks
SQL Server Administration tasks use Server Management Objects (SMO) to perform administrative tasks within SQL Server.
- Transfer Database task – copy or move a database between instances or create a new copy of the database on the same instance
- Transfer Error Messages task – copy user-defined error messages to another instance
- Transfer Jobs task – copy SQL Server Agent jobs to another instance
- Transfer Logins task – copy SQL Server logins to another instance, including the SID, if desired.
- Transfer Master Stored Procedures task – copy user-defined stored procedures in the master database to a new instance
- Transfer SQL Server Objects task – copy objects from one instance to another
SQL Server Maintenance Tasks
The SQL Server Maintenance tasks will look very familiar if you’ve ever configured a maintenance plan. These are pretty self-explanatory, so I won’t go into details on each one.
- Back Up Database task
- Check Database Integrity task
- Execute SQL Server Agent Job task
- Execute T-SQL Statement task
- History Cleanup task
- Maintenance Cleanup task
- Notify Operator task
- Rebuild Index task
- Reorganize Index task
- Shrink Database task
- Update Statistics task
Analysis Services Tasks
The Analysis Services tasks use Analysis Services connection managers to perform operations on SSAS objects.
- Analysis Services Execute DDL task – used to create, modify, or drop objects or models in SSAS
- Analysis Services Processing task – process your SSAS objects or models
- Data Mining Query task – retrieve data from a Data Mining model and load it into your RDBMS
Most business processes involve more than one step, and likewise your SSIS packages will typically contain a number of tasks that make up the overall workflow. Now, these tasks might simply operate in a sequential manner, one after another, until the end of the line. However, many times tasks need to be lumped together and treated as a logical unit. For example, your SSIS package might need to process files in a particular folder: loading data from the file into a staging table, performing some validation on the data, and finally archiving the input file. If you have multiple files to process, you probably want all of the files to be loaded and archived before the package moves on to further tasks. This load/validate/archive set of tasks should operate as a single unit.
To facilitate the grouping of tasks into logical units, SSIS gives us containers. Containers allow us to encapsulate a set of individual tasks into a single unit. They can also provide scope for resources such as variables, which can only be accessed by tasks within the container.
There are 3 types of containers in SSIS:
- For Loop container – executes the tasks within the container repeatedly based on an expression. This loop behaves just like a FOR loop in a programming language: as long as the expression evaluates to TRUE, the loop will continue.
- Foreach Loop container – executes the tasks within the container for each item being enumerated. For example, the load/validate/archive set of tasks mentioned above could be placed inside a Foreach Loop container that executes once for each file in the input directory.
- Sequence container – there’s no looping or programmatic logic to this container. The sole purpose of the sequence container is simply to group tasks into a logical unit and provide scope for variables.
As you can see, there are quite a few tasks to be familiar with, each with its own unique purpose. We’ll be exploring some of the more commonly used tasks as this series continues, so stay tuned.