Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Case Study: Importing New Data Without Emptying Existing Tables

By Robert Marda,

Importing New Data Without Emptying Existing Tables

Importing New Data Without Emptying Existing Tables

 

The case study I am about to describe has evolved over the past two years and is still changing as I learn new techniques.

 

Problem

 

Soon after I was given the position of SQL Programmer at bigdough.com I was put in charge of all the DTS packages on our SQL Servers.  Over time I discovered the following problems with our main DTS package which imports stock ownership data into our databases.  First, it seemed to be designed to be run manually with no fail safes nor checks to verify that everything ran correctly.  Second, it emptied the tables before pumping the new data into them every night.  These were problems as our customer base outside the United States was growing and we would get complaints from those customers that they couldn’t see any data.  I remember distinctly getting a complaint from a customer in Japan.  Also during this time period we got a new DBA who wanted all our DTS packages to be as dynamic as possible to take into account changes in the existing databases without having to change the DTS packages every time.  When we switched data providers we had to change our import process to handle a full update or a daily update depending on what we received on a given day;  The previous provider simply provided a full update every night.

 

Business Challenge

 

The challenge for me was to devise a way to keep the data available at all times while importing the new data, detect if a full or daily update was received and run appropriate data pumps, put in sufficient fail safes to ensure bad data would not get imported, and to make the process automatic including notification to pagers upon failure.

 

Solution

 

Keeping The Data Available

 

The first thing I tackled was to keep the data available at all times.  I adjusted the DTS package to create tables identical to the existing tables and added “_New” to the end of the table name.  This piece had to be dynamic in the sense that if new indexes were added to the original table the DTS package needed to be able to handle this without changing the DTS package.  I went to the system tables and found where info about tables, primary keys, and indexes were stored and created a stored procedure that would script tables with or without primary key and indexes.  The SP can also script the primary key and/or the indexes without the table.  The SP is available in the script section and can be viewed by clicking on its name: <A HREF="http://www.sqlservercentral.com/scripts/contributions/543.asp”> sp_CreateAndExecTableScript. </A>

With this SP I create most of the tables with all their indexes except for the ones that have nearly a million rows.  The data pump tasks go faster without the indexes and the placement of the indexes after the data is in place takes about 2.5 minutes.

 

After the data is imported into the new tables I run the stored procedure sp_rename multiple times to rename the existing tables with the extension “_old”.  Then I rename all the tables with the extension  “_New” to the original table name.  All these rename commands are encased by a BEGIN/COMMIT TRANSACTION with a ROLLBACK being issued if one of the rename commands fails.  This is an all or nothing transaction to avoid having old data in one table live at the same time as new data in a different table.

If this step fails a failure e-mail is immediately sent to my cell phone which is always beside my bed.  The phone beeps until I look at the e-mail.

 

After testing this we have found that data is always available.  Whoever is logged on during the import and/or swapping of tables will only experience a slight slowness on the site.  This slowness was reduced when we upgraded from SQL Server 7.0 to SQL Server 2000.

 

Importing Full or Daily Update

 

When we changed data providers we had to change our DTS package so it could detect if we received a full or daily update and make it run the appropriate data pumps and other SQL tasks.  All files associated with the daily update have the word daily in front of their name.  Those for a full update don’t.  The DTS package has a data pump for each file in a full update and each file in a daily update.

 

To control which data pumps get executed I created two SQL Execute Tasks.  One is called 1st Gateway Daily Update and the other is called 1st Gateway Full Update.  The only code in the task is a remark stating the purpose of the task.  All data pumps for the daily update can’t execute until the SQL task called 1st Gateway Daily Update succeeds.  All data pumps for the full update can’t execute until the SQL task called 1st Gateway Full Update succeeds.  I then created an ActiveX Script Task with a VBScript to detect which files were received.  If the files for a daily update are received then the VBScript marks the SQL Execute Task called 1st Gateway Full Update as completed.  This is interpreted by the DTS package as a failure and so none of the data pumps for the full update are executed.  The task called 1st Gateway Daily Update is marked as completed when a full update is received thus preventing those data pumps from executing.

 

There are other tasks after the appropriate data pumps execute that must run, however they can’t run until the needed data pumps succeed.  To make sure they would run after the data pumps I created two more SQL Execute Tasks similar to those described above and simply changed “1st” to “2nd”.  The 2nd Gateway Daily Update can’t execute until all the daily update data pumps succeed and the 2nd full update task can’t execute until the full update data pumps succeed.  All tasks that follow can’t execute until both the 2nd gateway tasks have completed. 

 

So, if a daily update is received the ActiveX Script Task will mark both full update gateways as completed.  This keeps all the full update data pumps from executing and makes sure the tasks that must run after the daily data pumps succeed wait for the data pumps.  Now the DTS Package can handle a full or daily update.

 

This method was used to avoid a failure entry for each data pump that didn’t find a file.  Half of the data pumps would fail and be logged as failed in the error log.  This method reduces the number of known failures from about 10 to 2.  One for each task marked as completed.

 

Here is a shortened version of the VBScript that does what I described in this section:

 

'**********************************************************************

'  Visual Basic ActiveX Script

'************************************************************************

 

Function Main()

 

' Specify the OLE DB provider.

set DATABASENAME = CreateObject("ADODB.Connection")

DATABASENAME.Provider = "sqloledb"

 

' Specify connection string on Open method.

ProvStr = "Server=SERVERNAME;Database=DATABASENAME;UID=SQLSERVERLOGIN;PWD=PASSWORD"

 

DATABASENAME.Open provStr

 

'Create temporary table

query = "CREATE TABLE #fileexists (doesexist smallint, fileisdir smallint, direxist smallint, txtfilename varchar(45) NULL default 'unknown')"

 

DATABASENAME.Execute(query)

 

query = "INSERT INTO #fileexists (doesexist, fileisdir, direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\dailyupdate.txt' UPDATE #fileexists SET txtfilename = 'dailyupdate.txt' WHERE txtfilename = 'unknown'"

 

DATABASENAME.Execute(query)

 

query = "INSERT INTO #fileexists (doesexist, fileisdir, direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\fullupdate.txt' UPDATE #fileexists SET txtfilename = 'fullupdate.txt' WHERE txtfilename = 'unknown'"

 

DATABASENAME.Execute(query)

 

'Start finding UpdateType

'1 indicates Full Update.  0 or 2 indicates at least one file missing.  3 indicates Daily Update.

UpdateTypeQuery = "SELECT doesexist AS UpdateType FROM #fileexists WHERE txtfilename = 'fullupdate.txt'"

 

set LocalImportType = DATABASENAME.execute(UpdateTypeQuery)

 

DTSGlobalvariables("UpdateType").value = LocalImportType("UpdateType")

 

'enters this if when file name institution.txt is not found

If DTSGlobalvariables("UpdateType").Value = 0 Then

            UpdateTypeQuery = "SELECT doesexist + 2 AS UpdateType FROM #fileexists WHERE txtfilename = 'dailyupdate.txt'"

 

            set LocalImportType = DATABASENAME.execute(UpdateTypeQuery)

 

            DTSGlobalvariables("UpdateType").value = LocalImportType("UpdateType")

 

End If

 

'End finding UpdateType

 

query = "DROP TABLE #fileexists"

 

DATABASENAME.Execute(query)

 

Dim oPackage

 

Set oPackage = DTSGlobalVariables.Parent

 

'This if handles daily updates.  It marks the Gateway Full Update steps as completed.

If DTSGlobalvariables("UpdateType").Value = 3 Then

 

            'Mark 1st Gateway Full Update step as completed

            oPackage.Steps("DTSStep_DTSExecuteSQLTask_4").ExecutionStatus = DTSStepExecStat_Completed

 

            'Mark 2nd Gateway Full Update step as completed

            oPackage.Steps("DTSStep_DTSActiveScriptTask_7").ExecutionStatus = DTSStepExecStat_Completed

 

End If

 

'This if handles full updates.    It marks the Gateway Daily Update steps as completed.

If DTSGlobalvariables("UpdateType").Value = 1 Then

 

            'Mark 1st Gateway Daily Update step as completed.

            oPackage.Steps("DTSStep_DTSExecuteSQLTask_1").ExecutionStatus = DTSStepExecStat_Completed

 

            'Mark 2nd Gateway Daily Update step as completed.

            oPackage.Steps("DTSStep_DTSActiveScriptTask_6").ExecutionStatus = DTSStepExecStat_Completed

 

End If

 

            Main = DTSTaskExecResult_Success

End Function

 

 

Avoiding False Failure Alerts

 

The above solution created a new problem.  The job that executes the DTS package will always complete reporting failure because of the tasks that get marked as completed.  To avoid this I modified the job to always report success even upon failure.  That avoids the false failure alert.  Unfortunately, the job will no longer alert me when there really is a failure.

 

For that I created a table where the DTS package will log when it started and when it finished.  Then I created a separate job that will count the number of times the job logged a start for the day and how many times it logged a finish for the same day.  If the numbers aren’t the same then an e-mail gets sent to my cell phone to wake me up and I groggily power up my computer to investigate the problem.

 

Fail Safes

 

We discovered a problem one day when all the expected data was not on our production servers and wondered why we didn’t receive an e-mail to alert us of a problem.  Upon investigation we discovered that we had received all the required files in the proper format.  The problem was that the vendor had sent one or two of the files with less than half the normal number of rows.  Nothing in the DTS package was designed to handle this problem since the few rows received were correct and so the data pumps moved them to their respective tables.

 

The fail safe I designed for this eventuality was to create an SQL Execute Task that would count the number of rows in each table.  If the row count fell below a minimum level then the task would send an e-mail to my cell phone and stop the job so that no more tasks were executed.  I placed this task after the data pump tasks to the new tables and before the task that renames all the tables to make the new data available on our web site.  Thus if there are not enough rows in any of the tables the new data will not be made live.

 

Another problem we faced was when no new files were available the import DTS package would still run and send an e-mail to tell us it had succeeded.  It had succeeded in importing yesterday’s data.  This was not acceptable.  I developed the following VBScript and placed it in an ActiveX Script task:

 

'**********************************************************************

'  Visual Basic ActiveX Script

'************************************************************************

 

'This task checks the file creation date for all  the text

'files in the zip file and if not today then it creates a directory

'and moves the file to that directory and then stops the

'DTS Package.

 

Function Main()

 

Dim TodayDate

Dim FullUpdateDate

Dim DailyUpdateDate

 

TodayDate = Date

 

Dim objFSO, objFolder, objFile, colFiles, objFSO2

 

Set objFSO2 = CreateObject("Scripting.FileSystemObject")

Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objFolder = objFSO.GetFolder(DTSGlobalVariables("gvFilePath").Value)

Set colFiles = objFolder.Files

 

'delete folder WrongDate if exists

If objFSO.FolderExists(ObjFolder&"\WrongDate") Then

            objFSO.DeleteFolder(ObjFolder&"\WrongDate")

End If

 

'begin checking dates on LionShares text files and move old files to WrongDate folder

If colFiles.Count > 0 Then

            For Each objFile in colFiles

 

                                    'checks dates for all files

                        IF UCase(objFile.Name) = "DAILYUPDATE.TXT" or UCase(objFile.Name) = "FULLUPDATE.TXT" _

                        or UCase(objFile.Name) = "FUNDS.TXT" or UCase(objFile.Name) = "HOLDINGS.TXT" _

                        or UCase(objFile.Name) = "STOCKS.TXT" or UCase(objFile.Name) = "ANALYTICS.TXT" _

                        or UCase(objFile.Name) = "FUNDANALYTICS.TXT" Then

                                    DailyUpdateDate = TRIM(LEFT(objFile.DateLastModified, instr(objFile.DateLastModified, " ")))

 

                        'the below line is for debugging purposes.

                        'MsgBox "!" & DailyUpdateDate & "!" & TodayDate & "!"

 

                                    IF TRIM(TodayDate) <> DailyUpdateDate Then

 

                                               

 

                                                If objFSO.FolderExists(ObjFolder&"\WrongDate") Then

                                                            objFile.Move(objFolder&"\WrongDate\")

                                                Else

                                                            objFSO.CreateFolder(objFolder&"\WrongDate")

                                                            objFile.Move(objFolder&"\WrongDate\")

                                                End If

                                               

                                    End If

 

                        End If

 

            Next

End If

'end checking dates on LionShares zip files and move old files to WrongDate folder

 

Set objFSO = nothing

Set objFolder = nothing

Set colFiles = nothing

 

            Main = DTSTaskExecResult_Success

End Function

 

This script will compare the date of each file with today’s date and if the date is not the same then the text file will be moved to a different directory.  Now we can just let the DTS package run.  If no files are left where the data pump tasks can find them then the entire package will halt and the process described in the section called Avoiding False Failure Alerts will send out an e-mail to my cell phone since the package will have logged a start time but no finish time.  The reason we will let the package continue at this point is because we don’t know if we are receiving a daily or a full update.  Since we’ll only receive one or the other then the one that was not received today will have the wrong date and thus get moved to the WrongDate folder before the task executes that determines if we have a full or daily update.

 

Benefits

 

We now have a DTS package that will not adversely affect our customers and that will let everyone who monitors them sleep knowing that if something really goes wrong a wakeup e-mail will be sent to our cell phones and pagers to alert us of the problem.  We now have fewer problems with the import and keep incomplete or bad data off our web site.  We also have eliminated complaints that say there is no data to be seen in the areas of our web site effected by this DTS package.

 

Summary

 

In this case study I have explained how to keep data available while new data is being imported and some of the fail safes that help automate the import process.  I have also described one method for handling full and daily updates in the same DTS package while minimizing the number of failures recorded in error logs.

Total article views: 8170 | Views in the last 30 days: 2
 
Related Articles
FORUM

SSIS Package Execution Flow Problem

SSIS Package Execution Flow

FORUM

importing db and updating daily thru scripting

importing db and updating daily thru scripting

FORUM

import database and update data in database daily

import database and update data in database daily

FORUM

Disabled DTS package still executed

Disabled DTS package still execute

FORUM

Execute Package Task

Execute Package Task

Tags
dts    
programming    
sql server 7    
visual basic 6    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones