Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

SSIS Guidance: Efficiency tips Expand / Collapse
Author
Message
Posted Saturday, November 02, 2013 1:53 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, November 02, 2013 1:46 AM
Points: 1, Visits: 5
Apologies in advance for my inexperience. I do not have a ‘problem’ per say. I am looking for advice or tips on how to make my SSIS packages better
and less taxing on our resources.

I currently run a few tasks via SSIS, but am seeking guidance in regard to whether or not I can be doing these tasks more efficiently.

I run over 100 SSIS jobs daily, staggered throughout the day, so utilizing resources better or streamlining code could be an exponential savings in the long view.

Here is my process, and how I am doing it now.
I am working with packages that were already written, so I do not currently know why some steps are done one way as opposed to another.

General:
In every case, I am downloading multiple flat files\text files that I load into tempTables in my DB.
There is one temp table per file downloaded.
I then update each table to make the data more compatible for display on my website.
I then download IMAGE FILES (Usually JPEG’s) based on the data in my tempTables.
Each step that requires an update to my DB, of course uses the Connection for my DB that I have already defined.
In the end I load all my formatted data and image information to one master table that eventually
publishes to my website via another process completely unique to what we are doing here.

These jobs must run daily and are supported around the clock by IT staff. In order to make troubleshooting easier, I do my best to keep all code overt and in scripts with comments..and I make everything one to one. I use one Connection per flat file, and one temp table per flat file, and only one Data Transformation task in order to load that table into the master table...or each INSERT is in its own step...rather than sending all of my temp tables into the master table in one INSERT step or in one Data Flow Task.

The following are the literal ‘steps’ in each package.

1st Step-Script Task Microsoft Visual Basic 2008
I begin by setting connection strings to use server entries that are defined in my SSIS_Database_Config.xml file.

2nd Step-Script Task Microsoft Visual Basic 2008
At the start of this step, I create a BATCH file on the fly to download my data files - either from an FTP server or in many cases from a RETS server.
For FTP I pass the ftp server login name and password in the URL and use inherent TCPIP ftp commands (I do not call third party software)
For RETS I call a custom Rets Download application (third party is necessary in this case) that uses XML config files to connect to the server and download the text files.
At the end of this step I execute the batch file that I just defined, in order to download my data text\flat files.

3rd Step- Execute SQL Task
Create tables via SQL code based on the header information contained in the downloaded txt files.
Each table accounts for one of the downloaded files. Example…I download three text files.
Lemon.txt, Apple.txt, Orange.txt – I create three tables with the field information based on these
text files… tempFruit_Lemon, tempFruit_Apple, tempFruit_Orange.

4th Step-Data Flow Task
I have one Data Flow Task for each flat file
Flat File Source – Lemon.txt
OLE DB Destination - tempFruit_Lemon
As needed, I define the proper ‘Flat File’ connection to import Lemon.txt and to set the field lengths all to 255 characters to match what I defined for my table definitions (VARCHAR (255) NULL)
There are a few fields that I know need to be larger, and would expect truncation each load if not- I define these as (8000) in the SQL and in the Connection property.
I set the OLE DB Destination to the table that I created in step 2… tempFruit_Lemon..and load the data from Lemon.txt into tempFruit_Lemon

5th Step- Execute SQL Task
Update the temp tables. This is one of the steps that can bog down my process. I just have one massive script file containing mostly single line UPDATES and CASE statements
In order to massage the data into a more suitable format to display on my website. I like to do it all in SQL so that it can be easily discovered when troubleshooting.

6th Step- Execute SQL Task OR Data Flow Task
Load data from the tempFruit tables into a ‘master table’ that it will be later published from.
In most cases I do this via an INSERT INTO SQL statement. In some cases, I will use a Data Flow Task and go from OLE DB source (tempFruit tables)
To OLE DB Destination (Master Fruit table)

7th Step-Script Task Microsoft Visual Basic 2008
Same as step 2, but now we are downloading JPEG files instead.
At the start of this step, I create a BATCH file on the fly to download my image files - either from an FTP server or in many cases from a RETS server.
For FTP I pass the ftp server login name and password in the URL and use inherent TCPIP ftp commands (I do not call third party software)
For RETS I call a custom Rets Download application (third party is necessary in this case) that uses XML config files to connect to the server and download the image files.
In most cases, I use information loaded into the ‘master table’ from my earlier data files – in order to download my image files. As result, I need to have the txt files
and their subsequent tables all loaded before attempting this step. At the end of this step I call the batch file, in order to download my data image files.

8th Step-Script Task Microsoft Visual Basic 2008
JOIN the names of the images that I have downloaded with the names of the expected images from the ‘master table’ – and insert their URL’s (INSERT INTO) an image table

9th Step – Execute SQL Task
Execute SQL to DROP all the tables that I created in this package.

This is the basic structure of all the tasks I am responsible for. There are more steps, with basically more of the same logic:…I also load another three or four tables with ‘sub’ information from the data files and
the temp tables, these processes are the same in detail as 7th Step-Script Task Microsoft Visual Basic 2008 – Utilizing data that I downloaded earlier, in order to download something else at the end.

Like let’s say I have another master table named ‘Fruit Color’ and I end up loading information from tempFruit_Lemon, tempFruit_Apple, and tempFruit_Orange all into this table with a related field in each.
(Fruit Item Serial Number)
Another table called ‘Vitamins’ where I load a different set of data from tempFruit_Lemon, tempFruit_Apple, and tempFruit_Orange....In all cases, I load the data via SQL INSERT INTO or by use of a predesigned Stored Procedure
that does basically the same, and I execute that via ‘Execute SQL Task’ as well

Not incredibly complex, I am just wondering if there are quicker and easier ways in order to do what I am already doing here. I run over a hundred jobs such as this all in a staggered schedule throughout the day via SQL Agent..
and they are all using a SQL Azure production environment. If I save 30 seconds per package, that is a big deal to me. Also anything with less traffic issues and more efficiency would be welcome – my current production issues have been during
long running SQL steps (Update Steps) I will just completely lose my connection information and have to reinitiate the job from the step it failed on. The step will fail with an error of ‘semaphore timeout elapse’ or ‘deadlock victim’ etc. as if I am being ‘throttled’ in some manner.

Thanks in advance and apologies for the remedial nature
Post #1510849
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse