SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
Search:  
 
 

Tim Mitchell

Tales of my travels through SQL Server
Add to Technorati Favorites Add to Google
Author Bio
Tim Mitchell is a Microsoft SQL Server consultant, developer, speaker, and trainer. He has been working with SQL Server for over 6 years, working primarily in database development, business intelligence, ETL/SSIS, and reporting. You can find his complete profile at TimMitchell.net.
 

Eliminating Empty Output Files in SSIS

By Tim Mitchell in Tim Mitchell | 09-30-2009 8:28 AM | Categories: Filed under: , ,
Rating: (not yet rated) |  Discuss | 2,847 Reads | 1067 Reads in Last 30 Days |10 comment(s)

So you’ve got some packages that regularly extract data to one or more text files, but you know that from time to time some of the queries will not return any data.  However, you find in SSIS that, in a flat file export package, the output file is created regardless of whether any rows are written to the file, and in cases where there are no rows returned from the source, you’ll end up with empty data files in your output directory.  Although not a critical failure, having empty output files can be a nuisance, and unfortunately, there aren’t any native settings in the Flat File Destination or the Flat File Connection Manager objects that will clean up empty files.  Fortunately, SSIS does provide a combination of other tools that you can use to emulate this behavior.

In our example, I’m going to create a package to extract data from a table using a query for which I know that no rows would be returned.  When I connect this data source to a flat file destination and execute the package, I’ll see an empty data file in my export directory.  Next, to demonstrate the intended behavior, I’ll add a Row Count transformation to store the number of affected rows in a variable, and create a File System Task object to delete the output file.  Finally, I’ll use an expression constraint to only delete the file if the variable attached to the row count is zero.

First, let’s create a test table to query:

USE testdb
GO

CREATE TABLE invoices (
  invoiceid     INT    IDENTITY ( 1 , 1 ),
  vendorid      INT,
  invoicedate   DATETIME,
  invoiceamount DECIMAL(10,2))

INSERT invoices
VALUES(12, '8/3/2009', 4125.50),
      (53, '8/13/2009', 1095.25),
      (46, '8/15/2009', 729.50),
      (33, '8/23/2009', 3421.50)


Now, I’ll create a basic package that will export to text the invoices for the past 30 days, a reasonable business requirement.  Since we don’t currently have any invoices matching that criteria, we’ll end up with an empty output file. The original package is shown below:

df_f1

 

Now for the new-and-improved version, I’m going to drop a Row Count transformation into the data flow to save the number of affected rows to a variable:

 

df_f2

 

Finally, I’ll create a File System Task and configure it to delete the output file.  To insure that a valid data file is not deleted, I’ll create a precedence constraint using an expression to only execute the delete if the row count variable is equal to 0:

 

df_f3


When you execute this package, you’ll see that the File System Task object is executed because there are no rows matching our query.  You can test the package by inserting another row into the database that will be returned by the query, and you’ll see that the data file is exported but not deleted.

I’ve attached the before and after SSIS packages if you’d like to take it for a test drive.  Enjoy!

Comments
 

renevn said:

I cannot find the attached SSIS packages

Where can I doenload them

kind regrds

Rene van Noorloos

October 5, 2009 5:49 AM
 

rhardy said:

I do have this problem of empty files.  I am fairly new to SSIS and would appreciate being able to learn from your SSIS packages described above.  Where can I go to downlod them?

Thank You.

Rick Hardy

October 5, 2009 8:57 AM
 

juliesmith said:

You could also use an execute sql task to capture the count of rows value and then set an expression on the disable property of the data flow.  So when @[User::RowCount]==0 is true, the Disable property would trigger, and the file wouldn't even be generated if there were no rows.

October 5, 2009 9:04 AM
 

Amol.Naik said:

I like juliesmith's approach since there is no file IO operation involved since we are determining file creation before hand.

Thanks to all.

Amol

October 5, 2009 11:30 AM
 

nerdmann said:

I too, am an SSIS newbie and am running into problems with this exercise.  If you could point us to your solution, it would be greatly appreciated.

October 6, 2009 9:22 AM
 

Tim Mitchell said:

For those who requested the sample files, I'm not sure why they don't show up on this post - I have included them on the attachments.  I'll get with Steve to find out how to display them.  In the meantime, shoot me an email at tdmitch [at] gmail and I'll email you the packages.  Thanks!

October 6, 2009 10:05 AM
 

vvkp said:

Hello Juliesmith,

I am a newbie in this area.

Can you be more detail how to  Disable property would trigger, and the file wouldn't even be generated if there were no rows.

Thanks in advance.

Krishna.

October 8, 2009 1:46 PM
 

juliesmith said:

Here is a step by step.

1.  Create a variable (I always use the whole package as my scope, so that I don't accidentally name two variables the same thing) data type is Int32.

Make sure the value of the variable is hardcoded in the beginning to something > 0 (or the data flow will NEVER) work.

2. Add an Execute SQL task to the package previous to the execution of the data flow task which moves the data from the table to the output file.

  The sql is something like:

SELECT    

 COUNT(invoiceid)

AS CountOf

FROM         dbo.invoices

3. Set the result set property on the general tab of the execute sql task to single row.

4. Click the result tab of the execute sql task.

5. Click add to add a row to resultset.

6. Change the name of the result to 0 (zero--this is important and NOT intuitive)

7. Map the variable name to your variable beside the ResultSet Name from the drop down of variables.

You are done configuring the execute sql task.

1. Now, select your data flow which does the work of moving the records to the output file.

2.  Go to the properties window of  the data flow task.

3.  Click on the expressions field of the properties window

4. Click the ellipses

Clicking the ellipses brings up the Property Expressions Editor

5.  Select Disable from the drop down under property.

6.  Click the ellipses beside the expression box for it.

This brings up the expression builder.

7.  Drag your variable down to the expression window from the variables in upper left pane.

8.  Follow it with == 0   (should look like this          @[User::RowCount]==0             )

9.  Click ok to close the expression builder  and then click ok to close the property expression editor .

October 9, 2009 7:31 AM
 

Aine said:

WOOTNESS!!! This worked perfectly - thank you!!!!

October 30, 2009 1:48 PM
 

silpapingali said:

Set 'DelayValiation' property of the data flow task to true.

November 11, 2009 10:42 AM
Leave a Comment
Only members of SQLServerCentral may leave comments. Register now for your free account or Sign-In if you are already a member.