This post is part of a series on this blog that will help me, and hopefully you, pass exam 70-463: Implementing a Data Warehouse with Microsoft SQL Server 2012. So far, we’ve covered:
- Tables and Schemas in the data warehouse
- Dimensions and Slowly Changing Dimensions
- Fact tables and measures
- Intro to columnstore indexes
- Columnstore indexes and partitioning
Once you’ve designed your fact tables and dimensions, decided how you’re going to partition and index your tables, and created your empty structures, it’s time for the fun stuff: loading data.
There are 2 basic scenarios when it comes to moving data from one repository to another: simple and complex. For simple data movements, data is read from the source and loaded into the destination as-is. This can be as simple as an export/import process. But for most data warehouse environments, a simple data movement isn’t going to cut it. Most of the time, we need to massage or transform the data prior to loading it into the data warehouse. This is a complex data movement, and is also known as ETL, or Extract-Transform-Load.
SQL Server Integration Services (SSIS) is Microsoft’s primary tool for developing and implementing the ETL process. And SQL Server Data Tools is the development environment in SQL Server 2012 for the development and deployment of SSIS projects, as well as SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) projects. If you’ve worked with SSIS packages prior to SQL Server 2012, you’re probably familiar with the Business Intelligence Development Studio (BIDS), which was similar to the SSDT interface.
The first time you start up SSDT, you may see a window asking you to choose your default environment settings. If you’ll primarily be developing SSIS packages , go ahead and choose Business Intelligence Settings. This will customize your environment for developing business intelligence projects. You can always change this setting later. Once Visual Studio opens, click New Project… Navigate to Installed Templates -> Integration Services, and click Integration Services Project. Give your project a name and click OK.
An SSDT Tour
In the upper right-hand corner of the SSDT interface you’ll see the Solution Explorer. There you should see the Solution and Project you just created, as well as all related files. Right now, the only file in our project is the default Package.dtsx. Any additional files or project-level connection managers or parameters you create will be listed here as well.
Below the Solution Explorer, in the lower right-hand corner, you’ll see a Getting Started window, listing resources for sample SSIS projects, which can be helpful for getting familiar with the interface. If you close the Getting Started tab, you’ll see the Properties pane.
Continuing clockwise, at the bottom of the interface you’ll see the Connection Managers pane, where you will define package-level connection managers. Connection managers allow SSIS to access various data sources and destinations. We’ll discuss connection managers in more depth next time.
On the left side of the interface you’ll see the SSIS Toolbox pane. The items listed in the Toolbox are context-specific. Right now, in the context of the control flow tab, you’ll find control flow tasks like the Execute SQL Task, Execute Process Task, FTP Task, etc. Click on the Data Flow tab, and you’ll see a different set of items in the Toolbox, ETL tasks that are specific to the data flow.
We should probably take a step back for a second and talk about control flow vs. data flow. In SSIS, control flow is the defined order of operations in a package and the precedence constraints that define when they are executed. Data flow describes the movement of data from source, through any transformations, to the destination. The data flow itself is actually a task within the control flow of the package.
Other uses for SSIS
Data movement is the most common use for SSIS, but keep in mind that SSIS is capable of much more, including
- File system operations and FTP access – if the data you need to load is transported or provided in files, SSIS can be used to access those files
- Invoking external processes – if, for example, an external service is required for your warehouse load process, the service can be invoked from within SSIS, making integration easier to maintain
- Database administration tasks – maintenance plans are implemented as SSIS packages
- OS operations and system state can be accessed via Windows Management Instrumentation (WMI)
- Sending email
- Processing SSAS objects
If you’re not already familiar with the SSDT environment, take some time to poke around. Download some sample projects and get comfortable with the interface. We’ll be spending a lot of time there in the next few weeks.