Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Tim Mitchell

Tim Mitchell is a business intelligence consultant, author, trainer, and SQL Server MVP with over a decade of experience. Tim is the principal of Tyleris Data Solutions and is a Linchpin People teammate. Tim has spoken at international, regional, and local venues including the SQL PASS Summit, SQLBits, SQL Connections, SQL Saturday events, and various user groups and webcasts. He is a board member at the North Texas SQL Server User Group in the Dallas area. Tim is coauthor of the book SSIS Design Patterns, and is a contributing author on MVP Deep Dives 2. You can visit his website and blog at TimMitchell.net or follow him on Twitter at twitter.com/Tim_Mitchell.

Eliminating Empty Output Files in SSIS

So you’ve got some packages that regularly extract data to one or more text files, but you know that from time to time some of the queries will not return any data.  However, you find in SSIS that, in a flat file export package, the output file is created regardless of whether any rows are written to the file, and in cases where there are no rows returned from the source, you’ll end up with empty data files in your output directory.  Although not a critical failure, having empty output files can be a nuisance, and unfortunately, there aren’t any native settings in the Flat File Destination or the Flat File Connection Manager objects that will clean up empty files.  Fortunately, SSIS does provide a combination of other tools that you can use to emulate this behavior.

In our example, I’m going to create a package to extract data from a table using a query for which I know that no rows would be returned.  When I connect this data source to a flat file destination and execute the package, I’ll see an empty data file in my export directory.  Next, to demonstrate the intended behavior, I’ll add a Row Count transformation to store the number of affected rows in a variable, and create a File System Task object to delete the output file.  Finally, I’ll use an expression constraint to only delete the file if the variable attached to the row count is zero.

First, let’s create a test table to query:

USE testdb
GO

CREATE TABLE invoices (
  invoiceid     INT    IDENTITY ( 1 , 1 ),
  vendorid      INT,
  invoicedate   DATETIME,
  invoiceamount DECIMAL(10,2))

INSERT invoices
VALUES(12, '8/3/2009', 4125.50),
      (53, '8/13/2009', 1095.25),
      (46, '8/15/2009', 729.50),
      (33, '8/23/2009', 3421.50)


Now, I’ll create a basic package that will export to text the invoices for the past 30 days, a reasonable business requirement.  Since we don’t currently have any invoices matching that criteria, we’ll end up with an empty output file. The original package is shown below:

df_f1

 

Now for the new-and-improved version, I’m going to drop a Row Count transformation into the data flow to save the number of affected rows to a variable:

 

df_f2

 

Finally, I’ll create a File System Task and configure it to delete the output file.  To insure that a valid data file is not deleted, I’ll create a precedence constraint using an expression to only execute the delete if the row count variable is equal to 0:

 

df_f3


When you execute this package, you’ll see that the File System Task object is executed because there are no rows matching our query.  You can test the package by inserting another row into the database that will be returned by the query, and you’ll see that the data file is exported but not deleted.

The code samples showing the "before" and "after" versions of the packages can be found here.  Enjoy!

Comments

Posted by Rene van Noorloos on 5 October 2009

I cannot find the attached SSIS packages

Where can I doenload them

kind regrds

Rene van Noorloos

Posted by rhardy on 5 October 2009

I do have this problem of empty files.  I am fairly new to SSIS and would appreciate being able to learn from your SSIS packages described above.  Where can I go to downlod them?

Thank You.

Rick Hardy

Posted by juliesmith on 5 October 2009

You could also use an execute sql task to capture the count of rows value and then set an expression on the disable property of the data flow.  So when @[User::RowCount]==0 is true, the Disable property would trigger, and the file wouldn't even be generated if there were no rows.

Posted by Amol.Naik on 5 October 2009

I like juliesmith's approach since there is no file IO operation involved since we are determining file creation before hand.

Thanks to all.

Amol

Posted by nerdmann on 6 October 2009

I too, am an SSIS newbie and am running into problems with this exercise.  If you could point us to your solution, it would be greatly appreciated.

Posted by Tim Mitchell on 6 October 2009

For those who requested the sample files, I'm not sure why they don't show up on this post - I have included them on the attachments.  I'll get with Steve to find out how to display them.  In the meantime, shoot me an email at tdmitch [at] gmail and I'll email you the packages.  Thanks!

Posted by vvkp on 8 October 2009

Hello Juliesmith,

I am a newbie in this area.

Can you be more detail how to  Disable property would trigger, and the file wouldn't even be generated if there were no rows.

Thanks in advance.

Krishna.

Posted by juliesmith on 9 October 2009

Here is a step by step.

1.  Create a variable (I always use the whole package as my scope, so that I don't accidentally name two variables the same thing) data type is Int32.

Make sure the value of the variable is hardcoded in the beginning to something > 0 (or the data flow will NEVER) work.

2. Add an Execute SQL task to the package previous to the execution of the data flow task which moves the data from the table to the output file.

  The sql is something like:

SELECT    

 COUNT(invoiceid)

AS CountOf

FROM         dbo.invoices

3. Set the result set property on the general tab of the execute sql task to single row.

4. Click the result tab of the execute sql task.

5. Click add to add a row to resultset.

6. Change the name of the result to 0 (zero--this is important and NOT intuitive)

7. Map the variable name to your variable beside the ResultSet Name from the drop down of variables.

You are done configuring the execute sql task.

1. Now, select your data flow which does the work of moving the records to the output file.

2.  Go to the properties window of  the data flow task.

3.  Click on the expressions field of the properties window

4. Click the ellipses

Clicking the ellipses brings up the Property Expressions Editor

5.  Select Disable from the drop down under property.

6.  Click the ellipses beside the expression box for it.

This brings up the expression builder.

7.  Drag your variable down to the expression window from the variables in upper left pane.

8.  Follow it with == 0   (should look like this          @[User::RowCount]==0             )

9.  Click ok to close the expression builder  and then click ok to close the property expression editor .

Posted by Aine on 30 October 2009

WOOTNESS!!! This worked perfectly - thank you!!!!

Posted by silpapingali on 11 November 2009

Set 'DelayValiation' property of the data flow task to true.

Posted by Kaushal on 8 January 2010

I tried and it worked for me. Thanks a bunch Julie!

One question about the step# 6. Why the Result Name should be 0? Why is it important to name it "0"?

(6. Change the name of the result to 0 (zero--this is important and NOT intuitive)

Posted by eyespi20 on 28 July 2010

Julie -- this was a perfect solution! Thank you so much for taking the time to detail it out so perfectly!

Posted by Ranu Mandan on 1 February 2011

Thank you.. this should be by default in SSIS

Posted by richxyz64-web on 4 April 2011

Does anyone have a similar solution to prevent empty flat files from being created from error outputs?  I have redirected bad rows to flat files, which is working properly.  But the file gets created even if there are no output rows.   Can I do a Row Count and disable the flat file connection when there are no output rows?

Posted by malar_haarshini on 27 April 2011

juliesmith . Its not working for me. I'm importing and exporting files using foreach loop.

Eg: I have three files

1st file - with record count(2)

2nd file - with record count (0)

3rd file - with record count (2)

I want to create 1st and 3rd file and skip the 2nd file.

But its created 1st and 2nd file and disabled dataflow for the 3rd file.

Did i missing something in your step? I pretty much followed all your step. Including setting delayvalidation = true.

Let me know.

Posted by steve.rowbottom on 26 April 2013

As per eyespi20 on 28 July 2010

Julie -- this was a perfect solution! Thank you so much for taking the time to detail it out so perfectly!

Posted by small.mitchell on 5 June 2013

Awesome, many thanks!

Posted by jagsalog on 28 April 2014

Julie's solution is ok but it does not always work especially if your output and source are determined at run time. Tim's solution is elegant but it creates an extra IO and step. I've  combined Tim's and Julie's solution.

1. Use a view as a data source for the Data Flow.

2. Create an Execute SQL Task to count the number of rows of the view and assign it to variable RowCount.

3. Create a normal Data Flow from the view to the flat file. (no need to count rows).

4. Create a precedence constraint using an expression @[User::RowCount]>0.

Thanks Tim and Julie.

Leave a Comment

Please register or log in to leave a comment.