Blog Post

SSIS: great idea, but buggy as heck.

,

Oh SSIS, how I love and hate you.

As DBA at my company, I don't actually develop much in SSIS. But I do have to get SSIS packages running through SQL Agent for our various ETL tasks.

The developers are by no means SQL BI experts, and their packages may not be perfect, but there's nothing technically wrong with them either. And yet, getting them to successfully run in UAT or production is a constant headache.

The first issue is the old package protection level problem. There are countless pages on the web explaining that the default protection level (encrypt sensitive with user key) is, essentially, "broken", in that a package deployed with this protection level will not run. That isn't actually true though. There are already several ETL jobs on our systems running packages with this protection level set. No proxy accounts, no funny configs, they're just deployed and they run.

Some of them. But others don't. And this is where the fun really starts.

When you set up an SSIS step in an agent job and get an error, you won't see anything useful in the job history.... sometimes. Again, we have some packages that return useful error information. Others simply say "the package execution failed". Again, there are no configuration differences that I can find anywhere between the jobs. And I know that on occasion the packages aren't configured any differently either... because it's the same package, with some slight modifications to a data flow task or some other unrelated component, that fails once the change is deployed, and gives no useful information when it used to give at least something.

In response to this problem, you'll see recommendations to use a cmdExec step in agent to launch dtexec instead of using a SSIS task step. On the plus side, this does essentially guarantee useful information in the SSIS job history. Unfortunately this introduces another problem: dtexec can't parse arguments correctly. Several of our packages have been developed with connection names that include spaces. Again, there's nothing techically wrong with this. BIDS is quite happy to let you name your connections this way, and dtexec is supposed to be able to handle them using quotes. But it doesn't.... sometimes. Specifically, it doesn't if you change your connection strings using a /connection argument to dtexec, and the connection you are changing includes a space.

Now, I know this, so I don't create connections with spaces in them. But the developers do not. And they shouldn't really need to: this is stuff that is supposed to work.

There is a way to fix it, and that is to change the quotes (but not all of them!) in the command line to "\" instead of just ". But you won't find that information anywhere you might begin looking (like in the help, or the Microsoft books on the subject, or msdn pages). And good luck getting this right when you have 10 different connections in your package, resulting in a command line with quotes all over the place.

"Use configuration files" I hear you say. Ah, the horror that is configuration files. It is usually the case that your ETL jobs will have some common connections, with some packages using a connection or two that the other packages don't. This shouldn't be an issue. If SSIS was well designed, you could create one configuration file with all of your connections, and point all of your packages to it. If a package has a connection named in the file, it should pick it up, and ignore the ones it doesn't need. But no, SSIS is not well designed. If you put connections in your config that the package doesn't use, you'll get an error. This means you end up with not just a "development" config, a "UAT" config and a "production" config. No, you wind up with a proliferation of configuration files. Often one for every package. This is a maintenance nightmare.

So how do you resolve THAT? Well, one way is to create a parent package. I like these anyway, for the purpose of control and auditing. You can create all of your connections in this package, then use script tasks to set package variables that the child packages then use (via a parent package variable configuraiton) to set their connection strings. Now you CAN have just one config file (for your parent package). But this takes a lot of tedious coding to set up, and it really should not be necessary.

The other problem with this is setting things up so you can re-run just some of the child packages in the parent. If each package was its own SQL agent job step, this would be easy. But with all the children contained in the parent you only get the one agent step. If you want to be able to execute just some components (due to partial faliure), you can:

1) Try to use checkpoints. Good luck. I don't recommend it.

2) Open the package in bids, disable the steps you don't want to run, and rerun. Doing this takes a long time, and BIDS is buggy as hell. Oh, and make sure you change it all back afterwards too.

3) Write your own logic to use yet more configuration information to decide which children to run. In my opinion this is the best solution, but again, it can be a tedious task to set up if you have a lot of child packages to conditionally execute. It is made easier if you have very well designed package architecture. Do you? As I mentioned, we don't. But what we have is supposed to work.

I've seen a few notes from Microsoft that they are aware of at least some of these issues. Unfortunately, several on connect have been marked as "resolved, won't fix" for at least a couple of years now.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating