Using the Data Viewer Intelligently in SSIS Data Flows

If you've ever needed to debug a fairly complex Data Flow, you've probably wanted to take a peek at what data is moving through the flow at various points. The data viewer in Integration Services is how you do that - but it has a major drawback. You may be processing thousands or millions of rows - and you'll have to page through each buffer with a plain data viewer. If you're trying to debug such a flow, one of the best ways to do so is by narrowing the scope of what you're looking at. But doing so with a data viewer means you need to restructure or filter your data flow, which means you definitely need to remember to remove that filtering before it goes to the next step of development.

Here's a little tip that should help you weed through data that you don't want to see, as well as move your "viewport" from one place to another in the data flow as easily as possible. Almost as easily as just removing and adding a new data viewer.

The Parts
You'll have to add a Multicast, Conditional Split, Sort, and Derived Column to your data flow. They each serve a purpose - and depending on your flow, you may or may not need each one, or you may be able to exchange it for another component.
The Multicast

This is the key to the "mobility" of using this technique. Wherever you want to see what's passing by - the place where you would otherwise place a data viewer - simply disconnect the flow, and insert the Multicast in between:

The Conditional Split

Next, since you don't want to see a million rows, or have to press that "play" button a thousand times, hoping to see relevant data, place a Conditional Split after the Multicast:

Inside the Conditional Split, place whatever condition will filter out only the records you're interested in seeing. In my case, I only want to see rows that have a customer key of 451:

The Sort

This is an optional part of the technique, but can be quite useful when the rows you're interested in are scattered throughout the rows passing by in the data flow. If you've researched the optimizations that SSIS uses to process data, you should know that the Multicast doesn't actually copy memory and the Conditional Split doesn't actually split rows either. The Multicast is simply a visual aid for you to design packages with - the pipeline engine looks at it as a license to multi-thread the data flow. The Conditional Split isn't really moving rows down two paths, it's simply marking each row with an "output name", and only allows relevant rows to be operated on by transforms attached to the various outputs.

If you simply attached a data viewer to the output of the Conditional Split, you may have to page through tens or hundreds of buffers (pages) by pressing the "play" button... while only seeing one row on each page. Using the Sort component forces the pipeline engine to copy and reformat the buffer. You can think of it as "buffer defragmentation."

The Derived Column

Virtually any transform will do here - but the Derived Column works well in that you can simply drop it on the surface without configuration and it will work. It's purpose here is simply to provide an endpoint, so that we can place a data viewer between it and the previous transform:

Using the Technique

The first time you set up this sub-flow, it could look something like this:

Each part of the sub-flow I've described here would have correct metadata, including the data viewer. If you run the flow, you'll see your data viewer pop up with only the filtered results you were looking for. After you're confident that the flow is dealing with the data correctly up to that point, you may want to move it to examine what the data looks like after the next component in the main flow. To do that, you need to:

Detach the incoming flow to the "Data Viewer Multicast" as well as the outgoing flow from it, and reattach the main flow back together.
Detach the main flow where you want to insert the "intelligent" data viewer sub-flow, and attach the "Data Viewer Multicast" in between the two components.
Resolve any metadata issues (red circles with the white X) in the sub-flow.
Remove and re-add the data viewer on the flow entering the "Data Viewer Endpoint" if your column metadata's changed. (You won't get an error or warning about this!)

You would end up with something like this:

Wrapping It Up

No, this isn't the easiest or quickest way to monitor what's going through your pipeline - but it does provide more flexibility and usability than the plain data viewer for complex flows that need debugging. Especially ones like the one I'm working on now, where I'm moving this down the flow... then up... then back down again...

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Using the Data Viewer Intelligently in SSIS Data Flows

Rate

Share

Share

Rate

Using the Data Viewer Intelligently in SSIS Data Flows

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts