So you’ve got some packages that regularly extract data to one or more text files, but you know that from time to time some of the queries will not return any data. However, you find in SSIS that, in a flat file export package, the output file is created regardless of whether any rows are written to the file, and in cases where there are no rows returned from the source, you’ll end up with empty data files in your output directory. Although not a critical failure, having empty output files can be a nuisance, and unfortunately, there aren’t any native settings in the Flat File Destination or the Flat File Connection Manager objects that will clean up empty files. Fortunately, SSIS does provide a combination of other tools that you can use to emulate this behavior.
In our example, I’m going to create a package to extract data from a table using a query for which I know that no rows would be returned. When I connect this data source to a flat file destination and execute the package, I’ll see an empty data file in my export directory. Next, to demonstrate the intended behavior, I’ll add a Row Count transformation to store the number of affected rows in a variable, and create a File System Task object to delete the output file. Finally, I’ll use an expression constraint to only delete the file if the variable attached to the row count is zero.
First, let’s create a test table to query:
USE testdb GO CREATE TABLE invoices ( invoiceid INT IDENTITY ( 1 , 1 ), vendorid INT, invoicedate DATETIME, invoiceamount DECIMAL(10,2)) INSERT invoices VALUES(12, '8/3/2009', 4125.50), (53, '8/13/2009', 1095.25), (46, '8/15/2009', 729.50), (33, '8/23/2009', 3421.50)
Now, I’ll create a basic package that will export to text the invoices for the past 30 days, a reasonable business requirement. Since we don’t currently have any invoices matching that criteria, we’ll end up with an empty output file. The original package is shown below:
Now for the new-and-improved version, I’m going to drop a Row Count transformation into the data flow to save the number of affected rows to a variable:
Finally, I’ll create a File System Task and configure it to delete the output file. To insure that a valid data file is not deleted, I’ll create a precedence constraint using an expression to only execute the delete if the row count variable is equal to 0:
When you execute this package, you’ll see that the File System Task object is executed because there are no rows matching our query. You can test the package by inserting another row into the database that will be returned by the query, and you’ll see that the data file is exported but not deleted.
I’ve attached the before and after SSIS packages if you’d like to take it for a test drive. Enjoy!
It's been well-documented through myriad blogs and forum posts about the case sensitivity of the comparisons in the SSIS lookup transformation (a good review can be found here). In a nutshell, a comparison using the lookup transformation is case sensitive when using the default setting of Full Cache, even if the values in the database are stored in a case insensitive collation. This happens because, in Full Cache mode, the comparison is done by the SSIS engine rather than the database engine, the former of which differentiates between "Value", "VALUE", and "VaLuE" regardless of the collation setting. But did you know that this transformation is space sensitive as well? Consider the following T-SQL code: SELECT Cast('char value' AS CHAR(20)) [val1] INTO test1 SELECT Cast('char value' AS CHAR(40)) [val2] INTO test2 SELECT t1.val1, t2.val2 FROM test1 t1 INNER JOIN test2 t2 ON t1.val1 = t2.val2 As you would expect, executing this code results in a successful match (INNER JOIN), even though we're comparing CHAR values of differing lengths (for more information, see this article for more information about spaces and padding in SQL Server). However, when the same comparison is run through an SSIS lookup transformation in Full Cache mode, the lookup on our sample data will not result in a match. Similar to the case sensitive lookup, you'll find that the SSIS engine would treat the strings 'Hello World' and 'Hello World ' (note the trailing spaces) as dissimilar values. Unlike in SQL Server, trailing whitespace is significant in SSIS value comparisons.
As a workaround, you can use the TRIM() function in SSIS and the RTRIM() T-SQL function to insure that your comparisons are ignorant of trailing whitespace. Alternatively, you could use a cache mode other than Full Cache, but you should be aware of the other implications before making such a change.
Note that this behavior is limited to fixed-length character fields, but could lead to some unexpected and hard-to-detect problems if you aren’t aware of the behavior.
In the next few months, I’ll be giving a couple of talks on SQL Server business intelligence. For October, I have the pleasure of presenting for SQL Lunch, a new online learning series pioneered by Patrick LeBlanc. On October 12th, I’ll be discussing ways to leverage SQL Server Report Builder 2.0 against your existing SSRS infrastructure to allow users to create their own ad-hoc and published reports.
Also, I’ll be visiting the Ft. Worth SQL Server User Group on November 18th. I’ll be discussing Intermediate SSIS, demonstrating some of the more complex tasks and transformations and demonstrating how to leverage these tools to assist with challenging ETL scenarios. Hope to see you there!
I see a lot of questions on the forums about updating existing data in SSIS. When the update is dynamic and is based on elements within your data flow, a common mistake is to use the Ole DB Command within the data flow pane:
The above poorly designed update has us retrieving data from some source, and then running an update on our target database based on one or more values in the source. Does it work? Sure. The problem occurs when you touch more than a handful of rows. When you bring in the Ole Db command into the picture in a data flow, you’ll be firing the statement in this control once for every row of data in your pipeline. So let’s say our source retrieves 100,000 rows into the data flow: the downstream UPDATE command will be executed 100,000 times! Such operations could bring the most capable server to its knees.
A better solution could include staging your data. Using this method, you’ll retrieve the data from the source and write it into a staging table in the destination database. You can then use an Execute SQL task to run your update in a more organic manner.
Set up your data flow as shown above to pull in the data to a staging table, then you can run a single UPDATE statement:
UPDATE i SET InvoiceAmount = st.UpdatedInvoiceAmount FROM Invoices i INNER JOIN StagedData st ON i.InvoiceID = st.InvoiceID
The advantage here is that you’re executing the expensive UPDATE statement once for each table rather than once for each row affected.
Of course, there are some situations that explicitly disallow the use of staging tables in destination systems. If storage or access restrictions keep you from using this method, you may have to use the row-by-row insert, so be aware that it's going to be a bottleneck.
I ran into a situation this week that brought to light a subtle syntactical error I’d made in creating an SSIS package. I’ve got a client that has given me access to their development server to create some complex extraction queries, which will eventually be rolled into SSIS packages. Since I’m working with read-only access and cannot create stored procedures during the development phase, I’m running these queries in an ad-hoc manner.
So, the queries are built and returning a reasonable set of data. I copy the entire text of the queries into a series of OleDB Data Sources in SSIS, and run my newly created package. The execution takes only seconds, which, considering the volume of data I’m moving, tells me something has gone wrong. The package had completed successfully, but the destination files were all empty. I tested the queries in SSMS again and confirmed the results, but the same query returns no results in SSIS.
The cause of this was a simple but subtle oversight. When I copied the query text into the command window in the OleDB Data Source, I had inadvertently also copied the USE [DATABASE_NAME] declaration included in each query. The inclusion of the USE [DATABASE_NAME] statement caused each data source to fire without error, but returned no rows from the source.
It is important to note that this *should* be a rare problem, since stored procedures are preferable to maintaining complex queries outside the database. If you have the appropriate permissions and organizational authority to wrap your logic into SPROCs, by all means do so.
So the takeaway is that if you find yourself copying an SQL statement directly into the query window of a data source, make sure you remove any USE [DATABASE_NAME] directives. Failing to do so can create a bug in your package that is easily overlooked.
[Edited to add SPROC disclaimer 8/23]
For SSIS developers, the need for proper documentation is crucial. However, the built-in object for documentation, the annotation, is difficult to use. It doesn’t wrap text, doesn’t support varying font styles in a single instance, and doesn’t offer spell checking. Further, all annotations are “at large” and are not attached to a particular object – they are associated with a specific task or component only by the location in which you place them.
If you’re like me and would like to see improvements to the SSIS annotation tool, consider visiting Microsoft Connect and offering your vote and feedback on a couple of items:
Connect Item 483132 – Suggestion to improve the SSIS annotation tool by adding rich-text capability. I added this item this morning.
Connect Item 216927 – Suggestion to allow linking of annotation to a specific object (task, component, etc.). This one has been out there for a while but only has a few votes.
I received my evaluation summary from the SQL Saturday event in Baton Rouge earlier this month. This was the first event in which I did more than just one session (and back-to-back sessions at that), and I’d just gotten over the flu as well, so I was a little nervous about how I’d present, but all told it worked out well. I’d like to see the “Average” column empty with respect to session content, but was glad to see the majority of instructor ratings in the “Excellent” column. I was fortunate to have Steve Jones in my scripting session, and he’s planning to send me a few notes on things I can do to improve.
Session Title: SSIS: Beyond the Basics
Poor
Average
Good
Excellent
Session overall:
How easy was the Session to understand?
1
7
17
Was the content suited to your requirements?
Were the topics covered in sufficient detail?
5
20
Would you recommend this Session to others?
15
Overall rating of the Session?
18
Instructor:
Ability to provide real world experience?
4
23
Ability to respond appropriately to questions?
How well prepared was the instructor?
Knowledge of subject matter?
3
Presentation abilities?
Overall rating of instructor?
Summary Comments
More basic than expected. Excellent “Food for Thought”. Awesome Presentation! Great Session! Strong voice with good diction. Great to listen to in an after lunch session. Tim was an excellent speaker. Brought up useful concepts that we need to try. Finally a really great session!! Great session and Demos.
Session Title: SSIS Scripting
14
16
13
9
8
10
The presentation is kind of long. I’m definitely a beginner but even I was able to understand his presentation. Very informative. Great information.
After a long weekend of travel and speaking, I’m finally able to put my feet up and get a little time to catch up on things. This weekend I travelled to LSU in Baton Rouge, Louisiana to speak at SQL Saturday 17, and was pleased with how well everything was run. Patrick LeBlanc, the organizer of the Baton Rouge SQL Server group, was ringmaster for a good group of volunteers and speakers. There were around 200 people in attendance, and the word-of-mouth feedback was mostly positive.
This event was held on the campus of LSU. The facility was great but a few rooms were a bit small, and as a result, several sessions were standing-room only. There was no dedicated speaker room, which should be a staple of any SQL Saturday event. Signage was good, particularly when you consider the size of the LSU campus. Lunch was from Subway, which was more than adequate, and there were plenty of drinks available throughout the day. The closing assembly and raffle was a little chaotic at first: some of the volunteers had written a custom web application to select winning ticket numbers for the drawings, which was running very slowly at first, but things came together quickly and the goodies were distributed quickly. There were fewer books than I’ve seen at other SQL Saturday events, which made the book giveaway quicker (this is generally a bottleneck at the end of the day).
The speaker and volunteer party on Friday as well as the attendee party tonight were well done, with a large separate space for our group each night. The food and drinks for both parties were paid for by sponsors, a nice touch to reward participants for their time. The after party had only 2-3 dozen attendees, not uncommon since about 15% attendance is expected.
This event was the first time I’ve done 2 different presentations in the same day. I discussed SSIS scripting, a favorite topic of mine and one I’ve delivered several times before, including at 2 previous SQL Saturday events. I also worked in a new presentation, “SSIS: Beyond the Basics” that covered some often underused (and misused) controls such as the For Each Loop Container, the Lookup Component, and the File System Task. Both sessions were well attended; this group was much quieter than audiences I’ve had in the past, so it’s difficult to judge the impact or effectiveness. There were many evals that were turned in for both sessions, and I’m looking forward to receiving the results. I was able to take in Steve Jones’ talk about the Modern Resume, and Barry Ralston’s discussion about PerformancePoint and Excel Services, both of which were well done.
Joining me in this trip was Trevor Barkhouse, a fellow member of the North Texas SQL Server User Group. He did his own session about deadlock detection, resolution, and prevention, which was very informative. Trevor and I, along with several other members of our Dallas group, are planning to host our own SQL Saturday event in the Dallas area next January or February, so we spent the weekend making a lot of mental notes about this event.
One of my favorite things about these events is the opportunity to spend time with other technical professionals. I had met Patrick LeBlanc briefly in Pensacola in June, but this weekend I spent several hours chatting with him. I got to catch up with Steve Jones, who is always enjoyable to visit with, and got to spend a little time with Barry Ralston as well. I visited with Jamie and Jeff, a couple of attendees whom I’d met in Pensacola in June, and was happy to see these “repeat customers” again.
Thanks to Patrick and crew for a great event! Hopefully we’ll see some of you in Dallas next year.
… and by government data, I mean the mountain of data recently made available by the G-Men on Data.gov. This site contains what must be terabytes of data on every topic from environmental measurements to crime statistics, from geographical data to labor statistics. The Obama administration has committed to greater transparency, and the availability of this data is a significant step toward that goal. The trendy geek magazine Wired.com recently did a feature on Data.gov that is worth reading.
It’s obvious that Data.gov is an immature portal. Delivery types are inconsistent – some files are available only as flat files, others as only Excel, and a few claim to offer XML feeds. The formatting can vary wildly from one set of data to the next, and often includes headers and footers which muddy up otherwise clean raw data files.
So why should you, as a database professional, care about this information? If you’re trying to improve your skills in database technologies (and especially in this economy, who isn’t trying to improve him/herself?), this data store is a great place to start. Because of the sheer size and sometimes unusual layouts, this information is an excellent test bed for honing one’s skills at Integration Services, Analysis Services, or for creating VLDBs (very large databases) on which to practice. And if you’re truly ambitious, there’s a contest to come up with the best application of this data, with a $10,000 bounty to the winner.
As for me, I’m currently pulling down some FBI crime data with the intention of using it in an upcoming SSIS class I’m presenting. Perhaps I’ll think up an app that could win the $10K as well….
For those in or around the Dallas area tomorrow (Thursday) evening, I'll be speaking at the North Texas SQL Server User Group meeting at the Microsoft campus in Irving at 7pm. I'll be giving my talk on Scripting in SSIS, one of my favorite topics. Sean McCown will also be continuing his series on PowerShell in SQL Server at 6pm. If you're a reader of this blog, please catch me after the presentation and say hello.
I’ve just arrived home from a quick trip to Pensacola to speak at SQL Saturday 14 in Pensacola, FL. I’m quite happy with the event; the planning and organization was handled very well, and I believe the event was a big success in the eyes of the attendees.
I flew into Pensacola on American Eagle, arriving at 3:30pm on Friday. I usually dread flying American, but this time wasn’t so bad; the flight was on time and the flight attendant (just one – it was a small plane) was polite and attentive. My rental car company, which shall remain nameless, was less impressive, since I had to deal with the cigarette residue from the previous renter, and I had to return all the way to the ticket counter to report some undocumented damage to the car before leaving. Nevertheless, I did get one good surprise upon arrival: Thinking that I had crossed into Eastern Time, I subconsciously added an hour to the time, only to realize upon arrival that western Florida is actually on Central Time. A free hour!
I got settled into my hotel room, which was about 5 miles from the event location. Note to self: Don’t be cheap, spend the extra $50/night next time and stay nearby. In a nice hotel. Andy Warren and I had tentatively planned to meet before the speaker reception, so I called him and we met up at McGuire’s Irish Pub. It’s a neat little place, and I recommend it if you find yourself near downtown Pensacola. One word of caution: read the signs on the outside of the restrooms carefully. By the way, the ladies room at McGuire’s is quite nice.
The speaker’s reception at the Fish House was a big hit. I don’t know if anyone kept count, but I’m guessing that we had 35-40 people that showed up over the course of several hours. It got a little loud when the peripheral crowds arrived, but a good time nonetheless.
The event itself went well, in my opinion. It got off to a slow start since the building was still locked at 7:45, but there were a lot of volunteers to help out with the heavy lifting. Traffic flow was very good and there were few bottlenecks during registration. According to the last figures I heard, there were 180 or so registered, and 170+ showed up (this included a number of walk-ups). I heard of one speaker that didn’t make it, but it was due to a family issue and Karla was able to adjust the schedule beforehand to allow for this. The book giveaway was done throughout the day rather than at the end, which helped to thin out the end-of-the-day swag bottleneck. Among the items given away was a pass to go on a deep sea fishing trip the day after (Sunday), which I personally would have loved to do but just couldn’t fit it into my Sunday schedule.
I spoke again on SSIS Scripting, and the session was full and seemed to be well-received. I neglected to plug in my laptop during setup – and believe me, a warning message stating “Your battery is at 7% – plug into a power source immediately” will throw a kink into the best presentation. I had an issue with the resolution on the projector which kept me from going full-screen on my VM, and this slowed me down during the demos. Despite these minor glitches, everything else went well, and we had some excellent questions and good discussion during and after the presentation. If you’re interested, I’ll be publishing the slide deck and code on my website in just a while.
I talked to several attendees during and after the event, and all of them that I spoke with spoke very highly of this SQL Saturday event. Those who attended were gracious, attentive, and polite, and I got the sense that they felt their time was well spent.
I did get to meet a number of people whom I’ve “known” for months or years but had never had a face-to-face chat, including Jack Corbett, Steve Jones, Brad McGehee, and Ken Simmons. I also got to visit with several folks that I had met at previous events, including Rodney Landrum, Karla Remail, Nathan Heaivilin, Pam Shaw, Jessica Moss, Joe Healy, and a number of others. I also shared a flight home on Sunday with Brad McGehee, so he and I got to visit for a while as we waited to fly back to Dallas. It’s great to share the company of these smart folks, because it 1) makes for interesting conversation and 2) gives me hope that I’ll collect a few IQ points by osmosis. For me, this event was also an experiment in personal networking, but that will be covered in another post.
My hat is off to Karla and the volunteers from the Pensacola SQL Server User Group. This event was well planned, organized, publicized, and executed. I’m hopeful that we can replicate this event, and its success, in the Dallas area sometime next calendar year.
I've got four new SSIS videos published on JumpstartTV.com:
Expression Language Basics
Secure FTP File Transfer in SSIS
SSIS Lookup Transformation Basics
SSIS Lookup Transformation - Lookup Failures
If you're not familiar with JumpstartTV.com, I would encourage you to stop by and give it a try. There are scores of short (3-5 minutes) videos that are all free, and are each targeted toward a specific task. You can even suggest topics for new videos. You can leave comments for each video, and we appreciate all comments as this is the best way for us to improve content.
For those who were in my session on SSIS Scripting at SQL Saturday Tampa last weekend, I have posted the examples and slide deck on my website. Feel free to contact me if you have any questions.
I’m sitting in the Tampa airport waiting for a flight back home from SQL Saturday 10 in Tampa. There was a good turnout at the event; the last news I heard was that over 300 had registered initially, and that there were close to 200 that actually showed up. I got to visit again with several people I worked with at the last SQL Saturday event, and met several people (Devin Knight and Jonathan Kehayias among them) who I “knew” from Twitter and blogs but hadn’t actually met before. I had the opportunity to hang out with Andy Warren and Chris Rock again, and had drinks with Rodney Landrum, Karla Remail and others at the after party. For this event I presented a session on using the SSIS script component to read and write unconventional data files. I was pleasantly surprised with the turnout; I had the first timeslot of the day, and Brian Knight was presenting at the same time, so I expected at best a moderately sized audience. As it turns out, by the end of the presentation there were people standing in the aisle, and there were lots of great questions and some good discussion at the end.
I spent most of the rest of the day dropping in on various sessions and taking pictures of presenters and attendees. Since SQL Saturday didn’t yet have a presence on Facebook, I created a new group and posted pictures of the event and after party; you can find the pictures posted here, and if you have any pics to add (of this event or any previous SQL Saturday), please add yourself to the group and feel free to add your images.
There were a couple of minor logistical issues at the beginning, but everything worked out well and we ended up with a successful event. The sessions I was able to listen in on were well done, and it appeared that most of the attendees stayed for the entire day. All told, another successful SQL Saturday event. I’m now even more excited about the possibility of bringing this to the Dallas area.
I received word earlier today that I’ll be presenting at the upcoming SQL Saturday event in Tampa, Florida on January 24. I got to present at one of the first SQL Saturday events last year in Jacksonville and it was quite a well run event. A good lineup of content along with a significant attendee turnout made for a highly successful event, and I hope the Tampa event will match the success of the previous experience.
As an aside, I’m hoping to use what I learn attending and presenting these two conferences to put together a similar event. I’ve had some talks with Andy Warren, one of the founders of SQL Saturday, and he’s very excited about the growth of this community event. I pinged my local SQL Server gang (North Texas SQL Server User Group) about the possibility of bringing SQL Saturday to Dallas, and I got a lot of positive responses. So if you’re in the Dallas area and would like to be a part of an upcoming SQL Saturday event, please let me know.