SQLServerCentral Article

Powering up DTS with PerlDTS

Introduction

When implemented together with MS SQL Server 2000, DTS offers several features to help managing ETL processes, including process automation, scheduling and notification. While DTS is considered a RAD tool, the true capabilities of DTS are only unrevealed when its exposed API is used.

While the documentation includes steps for Visual Basic or Visual C++ to use or extend DTS, such programming languages are not the ideal ones for developing applications quickly (considering that they will run in the background). Interpreted languages are much more suited to doing jobs like executing DTS packages outside Enterprise Manager or scheduling a task, for example.

Any interpreted programming language that supports COM can deal with DTS API, like Vbscript or Perl. With several years being part of system administrator and programmers tools, Perl is a natural choice for being used as a glue language for DTS ETL tasks, specially if one considers the amount of several read-to-use modules available in CPAN, dealing with the most different problems that a programming language can solve.

Being a Perl programmer for a couple of years and inspired by the article "Flexible DTS Packages with Perl" of Jeremy Brown (published in this very website!), I decided to give a try to write Perl code to deal with DTS API when some tasks became too boring to solve with click-and-drag procedures in DTS designer.

Enter PerlDTS

PerlDTS is the name of the project I created to hold several Perl classes that represents DTS classes in a more natural way (for a Perl programmer perspective).

In the article mentioned, the author teaches how to use the module Win32::OLE to connect to DTS API. While Win32::OLE is the backbone of PerlDTS project, using it can be cumbersome for a Perl programmer because it deals with details of MS Windows operational system (like converting variable types) that brings complexity into the problem a programmer is trying to solve. Talking about DTS API this is even worse, because documentation expects that the programmer will use Visual Basic, Vbscript or Visual C++. I had to improve a fair amount of my Vbscript skill just to understand the examples.

PerlDTS is different. I implemented a lot of DTS API classes (and there is still a lot of other ones to implement) using an interface that is pure Perl code with added syntactic sugar for convenience. Since I did this, my job got much more easy when I was in a project dealing with hundreds of DTS packages to implement batch integration with Oracle Siebel CRM.

What can Perldts do?

The nowadays implementation of Perldts allows a programmer to:

  • search and query information of DTS packages.
  • execute DTS packages and retrieve execution information.

Differently of the article of Jeremy Brown, I did not implemented anything related to the capability of creating DTS packages on the fly, keep them only in memory, executing and ceasing to exists as soon the program job is over. I'll comment those implementations decisions latter in the article.

Right now, instead of telling all features of PerlDTS, I invite the reader to check some program examples that I'll comment here to get a grasp of the project and, maybe latter, check the online documentation and UML diagrams for more details in the project website.

Why took you so long to release this?

Maybe you're asking yourself that question since MS SQL Server 2008 is already available and version 2000 is quite outdated. It's a fair question.

Well, the answer is "I was quite busy". I started working with DTS packages in 2006 and took me a while before releasing something more than a distribution tarball in CPAN. Anyway, I wanted at least to document the distribution before releasing it and here it is, as free software.

Examples of using

Let's start with a very simple example: connect to a SQL Server database and search for DTS packages with a name that matches a regular expression. Here is the code:

use warnings;
use strict;
use XML::Simple;
use DTS::Application;

my $xml_file = 'modify.xml';
my $xml = XML::Simple->new();
my $config = $xml->XMLin($xml_file);

my $app = DTS::Application->new( $config->{credential} );

my $pkg_info;

print 'Enter a regex for the package name: ';
my $regex = <STDIN>;
chomp $regex;

my $list_ref = $app->regex_pkgs_names($regex);

map { print $_, "\n" } @{$list_ref};

The program starts with the usual lines expected for a Perl program and loads the modules DTS::Application and XML::Simple. All modules of PerlDTS project starts with the package name "DTS". This will change very soon to the registered name I got in CPAN called Win32::SqlServer::DTS. DTS::Application is, in most cases, the unique module will need to load, since from it you can fetch all other classes and methods.

For all examples, we will use a XML file with the details of connecting into the SQL Server and that's why we are using XML::Simple, that will read a XML file and return an hash reference with the keys the method new of DTS::Application expects.

Once connected to a SQL Server database, DTS::Application got its method regex_pkgs_names, expects a regular expression to match the DTS package name and returns a list in a form of an array reference. This one was easy!

What about implementing a simple script to backup the DTS packages in a zip file?

use warnings;
use strict;
use XML::Simple;
use DTS::Application;
use constant XML_FILE => 'modify.xml';
use constant BACKUP_DIR => 'c:\\DTS-backup';
use Archive::Zip qw(:ERROR_CODES :CONSTANTS);
use DateTime;
use Cwd;

# :WARNING:16/10/2007:ARFJr: there is an issue saving DTS packages using the API:

# all the DTS layout is lost during the convertion to structured files! There is no

#documented way to fix that.

my $xml = XML::Simple->new();
my $config = $xml->XMLin(XML_FILE);

my $app = DTS::Application->new( $config->{credential} );
my $pkgs_list = [ $config->{package} ];
my $counter = 0;

foreach my $pkg_name ( @{$pkgs_list} ) {

my $pkg;

print 'Saving ', $pkg_name, '... ';

eval {

$pkg =

$app->get_db_package({ id => '', version_id => '', name => $pkg_name } );
};

if ($@) {

warn $@, "\n";

} else {

$pkgs_list->[$counter] = $pkg_name . '.dts';

$pkg->save_to_file( BACKUP_DIR, $pkgs_list->[$counter] );
print 'done.', "\n";
$counter++;

}

}

pack_files($pkgs_list);

###################################
# SUBS
###################################

sub pack_files {

my $pkg_list = shift;

my $zip = Archive::Zip->new();

my $old_path = getcwd();

# :TRICKY:10/10/2007:ARFJr: there are issues using complete pathnames with addFile # method
chdir(BACKUP_DIR);

foreach my $file ( @{$pkg_list} ) {

$zip->addFile($file);

}

die "Could not create the ZIP file: $!\n"
unless ( $zip->writeToFileNamed( get_backup_filename() ) == AZ_OK );

foreach my $file ( @{$pkg_list} ) {

unlink $file
or warn "Cannot remove file $file in " . BACKUP_DIR . ': ' . "$!\n";

}

chdir($old_path);

}

sub get_backup_filename {

my $today = DateTime->now();

return $today->year()
. $today->month()
. $today->day()
. $today->hour()
. $today->minute()
. $today->second() . '.zip';

}

Here we load the DTS::Application class to connect to the server and retrieve DTS packages for us.

There is a list of DTS packages to save in the modify.xml XML file. The DTS::Application object will them loop over the package names list and try to fetch them as DTS::Package objects, which is returned by invoking the method get_db_package.

A DTS::Package object is capable of saving itself to a structured file with the save_to_file method, but there is a price when doing that: all design made in the DTS designer will be lost. That means that beautiful diagram you made, with comments and everything will be lost. Every time the Enterprise Manager opens a DTS package saved as a structured file, it will try to recreate the design automatically. The result is usually ugly (and sometimes very ugly) to see and understand what's going on. Looks like Microsoft engineers didn't want to implement design persistence, because this a problem with the DTS API itself, with no workaround documented.

To make a job with better quality, I decide to load the modules Archive::Zip and DateTime to create zip files with the date of the backup in the filename (thanks CPAN!).

In the next example, we will fetch a DTS package, retrieve one of it's dynamic property task and change a assignment of it, saving the package after that. Modifying properties of a DTS package is a detail that deserves some comments about because it's not all properties that can be modified and used latter.

use warnings;
use strict;
use XML::Simple;
use DTS::Application;

my $xml_file = 'modify.xml';
my $xml = XML::Simple->new();
my $config = $xml->XMLin($xml_file);

my $app = DTS::Application->new( $config->{credential} );
my $package = $app->get_db_package( { name => $config->{package} } );

foreach my $dyn_prop ( @{ $package->get_dynamic_props() } ) {

my $iterator = $dyn_prop->get_assignments();

while ( my $assignment = $iterator->() ) {

print 'old: ', $assignment->get_sibling()->{DestinationPropertyID}, "\n";

my $dest = $assignment->get_destination();

if ( $dest->changes('GlobalVar') ) {

if ( $dest->get_destination() eq 'computer_name' ) {

$dest->set_string(

'\'Global Variables\';\'v_computer_name\';\'Properties\';\'Value\''
);

}

}

print 'new: ', $assignment->get_sibling()->{DestinationPropertyID}, "\n";

}

}

$package->save_to_server( $app->get_credential()->to_list() );

DTS::Assignment::Destination is a few of the classes that has set methods that allow changes to a object. Most of the classes are "read only" in this aspect because the absence of set methods in the DTS distribution but it is still possible to execute the method get_sibling available in almost all of the classes and fetch the original object from the original DTS API and change the value there.

Of course, any change made in the inner objects of a package will change their state if the method save of the package object is not invoked.

A real case of use

During development of the DTS packages, me and the other guys of development team were caught in a situation that we need to do unit tests with the DTS packages before moving them to environments of QA and production. Whenever we try to move a package, we find out that we forgot to activate a property or set the correct value to it, and those problems were repetitive.

I implemented some automated unit tests to verify those know problems and caught them even before trying to move a DTS package to other environment. These let us know to concentrate in the real problems we had to solve instead of looking for tiny details.

After defining the tests conditions and writing them using the module Test::More, I created a web application using CGI::Application and a modified version of Test::Harness to be able to execute tests without forking a new process. This allows the application to execute tests concurrent in different DTS packages and generating a HTML result of the tests.

The application is a bit too longer to mentioned every details of implementation of it here, but if the reader take some time of checking all methods calls, it's a very instructive way to learn how to move from one object to another till getting the desired property.

These are the set of test executed with any DTS package passed as an argument:

  • check if the package it is not logging in the database server;
  • check if the logging to a flat file is enabled;
  • check if write completion status to Event Log is disable;
  • check if Global variable are explicit declared;
  • check if the package has at least two connections;
  • check if the package has at least one datapump;
  • check if the connections are setup automatically by using a Dynamic Property task;
  • check if the Execute Package tasks have the Package ID property empty;
  • check several properties of flat file connections;
  • check several properties of datapumps.

Since the web application is MVC based, it's possible to modify the tests executed without worrying about modifying the other components.

Here are some screenshots:

The complete web application is included in the article and I intend to include it also in the PerlDTS project website. Be sure to read the README file to get the details about installing the web application in Apache 2.x and IIS. The web application has POD included when you install the available Perl modules with it.

Performance

With all this connect-to-server-and-fetch operations, it's valid to mention that a good practice is avoid using more than one DTS::Application in your program. Run the script benchmark.pl in the examples directory in the PerlDTS tarball to get an idea of speed difference.

Documentation and project website

PerlDTS project includes documentation in form of POD (exported to HTML as well) and UML diagrams. This documentation is complementary to the one provided by Microsoft SQL Server, not a substitute. It's necessary to use both for better understanding.

The PerlDTS project website is located at http://code.google.com/p/perldts/.

Implementation decisions and future of PerDTS

Quoting the CAVEATS of DTS.pm module:

"All objects under DTS distribution cannot be created without a reference to the original DTS object they mimic: at the current development state, object can only be recovered from a MS SQL Server database. Some classes may have methods to change their inner attributes, other classes don't. Check the POD for each class to be sure, but future releases should have write methods for all classes implemented".

This means that each object instantiated will have a attribute called _sibling that holds a reference to a Win32::OLE object that is the exactly counterpart of the original DTS class it mimics. The PerlDTS project was initially developed to work as a "reporting" tool, not as a new way to create DTS packages. Since only the properties values are necessary for reporting, usually it's a good idea to execute the method kill_sibling to remove the reference to the original object and release computer resources during the process.

That said, if you want to create DTS packages in memory and on the fly, PerlDTS is not the project you're looking for.

I would argue the need to do it anyway. Some tasks are done much easier in the DTS designer (like creating new connections) than using programming languages. On the other side, executing a package through Perl has lots of advantages, including for using better scheduling tools and better reports for errors during execution (check out DTS::Package::Step and DTS::Package::Step::Result classes). It is better to use what is good in DTS packages created in the DTS designer and leave the rest for your Perl scripting skills. Perl, for example, is much more powerful to execute any data transformation than DTS (just check out CPAN for ready to use code).

There are a lot of work still be done in PerlDTS project (specially classes that don't have a Perl counterpart implemented). Any comment, suggestion or patches are much welcome!

Resources

Rate

4.14 (7)

You rated this post out of 5. Change rating

Share

Share

Rate

4.14 (7)

You rated this post out of 5. Change rating