I have recently received SSIS training and have had the opportunity to write my first package. As a software developer, I know how hard it is to write perfect code but I ran into some rather significant problems that I find hard to believe an organization with the talent and resources of Microsoft could have missed. I am quite impressed with the published capabilities of SSIS but after writing my first internally important package (which ultimately failed), I was quite embarrassed.
I'm not the type to point a finger very quickly as I have made my fair share of mistakes in the past. In an effort to make my experience beneficial to others, I thought I would put something together to document my experience, showing how I went about identifying each problem and ultimately resolving them. I have also included two SSIS packages to help others who are trying to create a package to perform a similar task. Let me begin by explaining the purpose of my package and the general concept of how it works.
Package Control Flow Diagram
The package is responsible for copying Stored Procedures and User Functions from one Database to another. I saw that Microsoft had provided a "Transfer SQL Server Objects" task which appeared it would be able to do what I needed all by itself. After completing the package, my first test resulted with no errors, but also did not copy any objects to the destination database either. So I reached into the developer's bag of tricks and pulled out the single most valuable tool in the diagnostic arsenal, Google. After a half hour of reading articles, I found what I was looking for; I needed to set the "CopySchema" property to True. Why Microsoft gives me a property that if not set causes the task to do nothing is beyond me, but easy enough to fix.
So I deleted all the Stored Procedures and User Functions from the destination database and ran the package again. My second test also ended in failure giving me an error telling me that Procedure "xxx" already existed at the destination database. With a quick look at the Task Editor, the "DropObjectsFirst" property jumped out at me and I thought, OK, that error makes sense; you need to drop the procedures with the same name from the destination database before you can copy them over from the source database. I figured if I set this property to True that would resolve the problem. When I ran the package for a third time, I got an error telling me that Procedure "yyy" did NOT exist at the destination database.
So I had discovered the first problem I was not sure how to resolve. It made sense to me to get an error when the procedure already existed at the destination database and the 'DropObjectsFirst' property is set to 'False'. When you set it to 'True', you would expect the behavior to be that any procedures being copied from the source database that did not exist on the destination database should just be silently transferred. After all, the whole point of setting the property to 'True' is to ensure that procedures don't exist at the destination database that are
being transferred from the source database. I would expect it would be common occurrences to have new procedures on the source database you were trying to copy over to the destination database. You would think Microsoft would have thought of such a simple test right?
I went back to Google to see if anyone else had seen this problem and what ideas were floating out there to work around it. What I found was a suggestion to write a script task to handle dropping the procedures from the destination database before running the Transfer SQL Server Objects task. I figured it wasn't that much work to write some code to handle this so I rolled up my sleeves and got busy writing a Data Flow task to get a collection of procedures and functions from each database and merge the collections through an inner join to derive a collection of objects common to both databases. I passed this collection to a For Each Loop task and then inside the loop, used a script task to create the Drop statement which was sent to an "Execute SQL" Task.
Data Flow Task to Build Collection of Common Database Objects
Everything seemed to be working perfectly until I reached the "Transfer SQL Server Objects" task in the control flow. This time I had successfully transferred 25 procedures and on the 26th procedure I got an error telling me the procedure already existed at the destination database. I thought how can this be? I started with an empty destination database; there were no procedures in the database to begin with so how can there be a duplicate procedure?
Well, after a couple of days of Googling, testing and debugging, I finally found the problem after running a trace on the destination database. The trace file revealed that the Transfer SQL Server Objects task was attempting to create 2 procedures with the same name at the destination database. I thought how is this possible? You can't have 2 procedures with the same name on the same database instance. Right away I suspected I was not connecting to the source database I thought I was. I verified all my server and database settings and was truly at a diagnostic dead end.
For a sanity check, I exported all the stored procedures from the source database into a single file and using a text search, I looked for the procedure name that was being created twice. The name was only found in the file once so that ruled out the possibility of a procedure creating another from within itself with this procedure's name. So if there is only one create statement for this procedure, then where is the second procedure coming from?
A little more digging in the trace file revealed the answer; the task was trying to create two versions of the same procedure on the destination database. A careful examination of the text in each of the procedures (which was fortunately available in the trace file) showed that one of these versions was in fact the previous version that had originally been created on the source database but was later replaced by the second version of the procedure.
At first, I thought the underlying database was somehow keeping a version history of procedures that had been copied and renamed so I made every attempt to locate both versions of the procedure in the source database by looking at every view I could find relating to the stored procedure text. There was only the one, current version of the procedure visible to me on the source database. So now I knew how it was happening but didn't really know 'what' exactly was happening. My thought was that this couldn't be a SQL database bug since the views available to me show only the correct version of the procedure. That suggested to me that however the Task was getting the procedure list and associated text, must not be using the views I had available to me.
Assuming I was dealing with a bug internal to the Task, I thought I would just set the "MaximumErrorCount" property to something ridiculous like 999999 and the task would just ignore cases where the procedure already existed and continue transferring the remaining procedures. Unfortunately, the "MaximumErrorCount" property had absolutely no effect on the operation of the task. As soon as it got the first error, it threw an exception and aborted the transfer.
Just to be sure I didn't have something funky in my database, I tried this against another database and got some surprising results, this time, I got 3 procedures transferred to the destination database that don't even exist on the source database!!! To be sure I didn't make a mistake, I deleted all the procedures from my destination database, ran the package (which would work fine with "this" database if I would delete all procedures first) and sure enough, there were 3 procedures that did not exist on the source database.
Suspecting that however unlikely it could be that I had two bad databases, I decided to create two fresh test databases and then using the script file I had created earlier, I loaded all the procedures from the first source database I had been working with into my new test source database. I ran the package on the 2 new test databases and it worked perfectly 10 out of 10 times! Then I decided to keep pointing at my test destination database, but changed the source database to point to each of the other two source databases that had previously failed. This resulted in the same failures from both of them as I had observed before.
I thought to myself, this task is fundamentally flawed and absolutely unusable. How could Microsoft miss such basic and fundamental error checking? If I would have released a piece of software this flawed to my organization, I would have been flogged and publicly humiliated. This is just plain shoddy work that should have never passed into the release phase of development. With all the money Microsoft pulls in, I think it's high time they started testing their own software instead of dumping it on the development community to do for them (*for free).
Now I set out to write my own task to handle the object transfer between databases. I didn't know it at the time, but this would prove to be the silver bullet in identifying the real cause of the problem I was having and it wasn't a bug in the Transfer SQL Server Objects task as I had originally suspected (although the task does have some of those as well).
Once I completed coding my task and ran the package, I got the exact same results as the Microsoft task did. I was quite surprised by this but this time, I could get under the hood of the task with debugging tools and see what was really going on. I discovered it wasn't that an older version of the procedure was somehow being stored in the underlying database; it was that the name of the procedure and the name referenced in the procedure text did not match. The name of the procedure in the definition text, which is used to create the procedure, was in fact the same as another procedure so it was in fact a duplicate procedure.
Now I could see 'what' was happening but 'why' was still a bit of a mystery. In the case I was investigating, it just so happened that the older version of the procedure I had seen in the trace file was in fact an older version of the procedure that had been renamed from 'xxx' to 'xxx_old". This is what sent me down the wrong path, just some dumb coincidence. When I looked in the sys.objects view from the database system views, I found the procedure named 'xxx_old' as expected but after adding the join with the sys.sql_modules view to get at the procedure definition text, I saw the name in the "Create Procedure" line was 'xxx'. That was the problem! The procedure 'Name' and procedure name referenced in the definition text did not match.
I had a procedure named 'xxx' and a procedure named 'xxx_old' being transferred
from the source database, when the 'xxx_old' procedure script was executed, it
was actually creating a procedure named 'xxx'. This is what threw me off when I
was debugging the Microsoft task; they used the name of the procedure being
created (from the definition text) rather than the visible name of the
procedure being transferred in the error message. The error message made it appear that a duplicate procedure was being copied from the source database. I don't fault Microsoft for writing the error message the way they did, I agree with their reasoning, it was just an unfortunate scenario that prevented me from being able to correctly identify the problem.
So now that I knew what the problem was and why it was happening, I set out to duplicate the problem myself. So I went back to my test databases that worked just fine and made a copy of one of the procedures and renamed it in the Object window using the right mouse rename menu item. As expected, the name was changed in the sys.objects view 'name' column but in the sys.sql_modules view 'definition' column, the old procedure name was still in the create statement.
Now that I had identified the problem and was able to successfully duplicate the problem, it was now time to figure out a way to solve the problem so I could make this package work. The solution that came to mind was to add code to the script task that was creating the drop statement to search the definition text for the 'Create Procedure' line and parse out the procedure name. Then I compared the name from the definition text to the name stored in the sys.objects table and if there was a mismatch, the definition text procedure name was changed to match the name from the sys.objects view.
This time when I ran the package, I was able to successfully transfer all the stored procedures and user functions from both of the databases I was not able to get to work using the Microsoft Transfer SQL Server Objects task. I was also able to confirm that this time, the procedure name and the name referenced in the definition were identical.
You can download some packages and code here.
For Microsoft to allow you to rename a stored procedure in the object window of the SQL Server Management Studio and not make the necessary changes to all tables in the underlying database is a major oversight and should be corrected immediately. One of the potential problems that could occur as a result of this bug would be that if you simply renamed an older version of a procedure in the Object window of SQL Server Management Studio, the old version could potentially overwrite the current version because the internal names matched. Now you have the old version of a procedure masquerading as the new version.
I hope you found this article helpful and hope the packages will help you work around the current limitations of this particular task as well as provide a base package you can modify and tweak to meet your own needs. If nothing else, the bugs have been exposed and the work around has been made public. Now we can continue to push the limits of the SSIS technology to meet the ever increasing demands of our profession.