SQLServerCentral Article

Case Study: Importing New Data Without Emptying Existing Tables

,

Importing New Data Without Emptying Existing Tables

The case study I am about to describe has evolved over the past two years

and is still changing as I learn new techniques.

Problem

Soon after I was given the position of SQL Programmer at bigdough.com I was

put in charge of all the DTS packages on our SQL Servers.

Over time I discovered the following problems with our main DTS package

which imports stock ownership data into our databases.

First, it seemed to be designed to be run manually with no fail safes nor

checks to verify that everything ran correctly.

Second, it emptied the tables before pumping the new data into them every

night. These were problems as

our customer base outside the United States was growing and we would get

complaints from those customers that they couldn’t see any data.

I remember distinctly getting a complaint from a customer in Japan.

Also during this time period we got a new DBA who wanted all our DTS

packages to be as dynamic as possible to take into account changes in the

existing databases without having to change the DTS packages every time.

When we switched data providers we had to change our import process to

handle a full update or a daily update depending on what we received on a

given day; The previous

provider simply provided a full update every night.

Business Challenge

The challenge for me was to devise a way to keep the data available at all

times while importing the new data, detect if a full or daily update was

received and run appropriate data pumps, put in sufficient fail safes to

ensure bad data would not get imported, and to make the process automatic

including notification to pagers upon failure.

Solution

Keeping The Data Available

The first thing I tackled was to keep the data available at all times.

I adjusted the DTS package to create tables identical to the existing tables

and added “_New” to the end of the table name.

This piece had to be dynamic in the sense that if new indexes were added to

the original table the DTS package needed to be able to handle this without

changing the DTS package. I

went to the system tables and found where info about tables, primary keys,

and indexes were stored and created a stored procedure that would script

tables with or without primary key and indexes.

The SP can also script the primary key and/or the indexes without the table.

The SP is available in the script section and can be viewed by clicking on

its name: http://www.sqlservercentral.com/scripts/contributions/543.asp

sp_CreateAndExecTableScript.

With this SP I create most of the tables with all their indexes except for

the ones that have nearly a million rows.

The data pump tasks go faster without the indexes and the placement of the

indexes after the data is in place takes about 2.5 minutes.

After the data is imported into the new tables I run the stored procedure

sp_rename multiple times to rename the existing tables with the extension

“_old”. Then I rename all the

tables with the extension

“_New” to the original table name.

All these rename commands are encased by a BEGIN/COMMIT TRANSACTION with a

ROLLBACK being issued if one of the rename commands fails.

This is an all or nothing transaction to avoid having old data in one table

live at the same time as new data in a different table.

If this step fails a failure e-mail is immediately sent to my cell phone

which is always beside my bed.

The phone beeps until I look at the e-mail.

After testing this we have found that data is always available.

Whoever is logged on during the import and/or swapping of tables will only

experience a slight slowness on the site.

This slowness was reduced when we upgraded from SQL Server 7.0 to SQL Server

2000.

Importing Full or Daily Update

When we changed data providers we had to change our DTS package so it could

detect if we received a full or daily update and make it run the appropriate

data pumps and other SQL tasks.

All files associated with the daily update have the word daily in front of

their name. Those for a full

update don’t. The DTS package

has a data pump for each file in a full update and each file in a daily

update.

To control which data pumps get executed I created two SQL Execute Tasks.

One is called 1st Gateway Daily Update and the other is called 1st Gateway

Full Update. The only code in

the task is a remark stating the purpose of the task.

All data pumps for the daily update can’t execute until the SQL task called

1st Gateway Daily Update succeeds.

All data pumps for the full update can’t execute until the SQL task called

1st Gateway Full Update succeeds.

I then created an ActiveX Script Task with a VBScript to detect which files

were received. If the files for

a daily update are received then the VBScript marks the SQL Execute Task

called 1st Gateway Full Update as completed.

This is interpreted by the DTS package as a failure and so none of the data

pumps for the full update are executed.

The task called 1st Gateway Daily Update is marked as completed when a full

update is received thus preventing those data pumps from executing.

There are other tasks after the appropriate data pumps execute that must

run, however they can’t run until the needed data pumps succeed.

To make sure they would run after the data pumps I created two more SQL

Execute Tasks similar to those described above and simply changed “1st

to “2nd”. The 2nd

Gateway Daily Update can’t execute until all the daily update data pumps

succeed and the 2nd full update task can’t execute until the full

update data pumps succeed. All

tasks that follow can’t execute until both the 2nd gateway tasks

have completed.

So, if a daily update is received the ActiveX Script Task will mark both

full update gateways as completed.

This keeps all the full update data pumps from executing and makes sure the

tasks that must run after the daily data pumps succeed wait for the data

pumps. Now the DTS Package can

handle a full or daily update.

This method was used to avoid a failure entry for each data pump that didn’t

find a file. Half of the data

pumps would fail and be logged as failed in the error log.

This method reduces the number of known failures from about 10 to 2.

One for each task marked as completed.

Here is a shortened version of the VBScript that does what I described in

this section:

'**********************************************************************
'  Visual Basic ActiveX Script
'************************************************************************
 
Function Main()
 
' Specify the OLE DB provider.
set DATABASENAME = CreateObject("ADODB.Connection")
DATABASENAME.Provider = "sqloledb"
 
' Specify connection string on Open method.
ProvStr = "Server=SERVERNAME;Database=DATABASENAME;UID=SQLSERVERLOGIN;PWD=PASSWORD"
 
DATABASENAME.Open provStr
 
'Create temporary table
query = "CREATE TABLE #fileexists (doesexist smallint, fileisdir smallint, direxist smallint, txtfilename varchar(45) NULL default 'unknown')"
 
DATABASENAME.Execute(query)
 
query = "INSERT INTO #fileexists (doesexist, fileisdir, direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\dailyupdate.txt' UPDATE #fileexists SET txtfilename = 'dailyupdate.txt' WHERE txtfilename = 'unknown'"
 
DATABASENAME.Execute(query)
 
query = "INSERT INTO #fileexists (doesexist, fileisdir, direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\fullupdate.txt' UPDATE #fileexists SET txtfilename = 'fullupdate.txt' WHERE txtfilename = 'unknown'"
 
DATABASENAME.Execute(query)
 
'Start finding UpdateType
'1 indicates Full Update.  0 or 2 indicates at least one file missing.  3 indicates Daily Update.
UpdateTypeQuery = "SELECT doesexist AS UpdateType FROM #fileexists WHERE txtfilename = 'fullupdate.txt'"
 
set LocalImportType = DATABASENAME.execute(UpdateTypeQuery)
 
DTSGlobalvariables("UpdateType").value = LocalImportType("UpdateType")
 
'enters this if when file name institution.txt is not found
If DTSGlobalvariables("UpdateType").Value = 0 Then
            UpdateTypeQuery = "SELECT doesexist + 2 AS UpdateType FROM #fileexists WHERE txtfilename = 'dailyupdate.txt'"
 
            set LocalImportType = DATABASENAME.execute(UpdateTypeQuery)
 
            DTSGlobalvariables("UpdateType").value = LocalImportType("UpdateType")
 
End If
 
'End finding UpdateType
 
query = "DROP TABLE #fileexists"
 
DATABASENAME.Execute(query)
 
Dim oPackage
 
Set oPackage = DTSGlobalVariables.Parent
 
'This if handles daily updates.  It marks the Gateway Full Update steps as completed.
If DTSGlobalvariables("UpdateType").Value = 3 Then
 
            'Mark 1st Gateway Full Update step as completed
            oPackage.Steps("DTSStep_DTSExecuteSQLTask_4").ExecutionStatus = DTSStepExecStat_Completed
 
            'Mark 2nd Gateway Full Update step as completed
            oPackage.Steps("DTSStep_DTSActiveScriptTask_7").ExecutionStatus = DTSStepExecStat_Completed
 
End If
 
'This if handles full updates.    It marks the Gateway Daily Update steps as completed.
If DTSGlobalvariables("UpdateType").Value = 1 Then
 
            'Mark 1st Gateway Daily Update step as completed.
            oPackage.Steps("DTSStep_DTSExecuteSQLTask_1").ExecutionStatus = DTSStepExecStat_Completed
 
            'Mark 2nd Gateway Daily Update step as completed.
            oPackage.Steps("DTSStep_DTSActiveScriptTask_6").ExecutionStatus = DTSStepExecStat_Completed
 
End If
 
            Main = DTSTaskExecResult_Success
End Function

Avoiding False Failure Alerts

The above solution created a new problem.

The job that executes the DTS package will always complete reporting failure

because of the tasks that get marked as completed.

To avoid this I modified the job to always report success even upon failure.

That avoids the false failure alert.

Unfortunately, the job will no longer alert me when there really is a

failure.

For that I created a table where the DTS package will log when it started

and when it finished. Then I

created a separate job that will count the number of times the job logged a

start for the day and how many times it logged a finish for the same day.

If the numbers aren’t the same then an e-mail gets sent to my cell phone to

wake me up and I groggily power up my computer to investigate the problem.

Fail Safes

We discovered a problem one day when all the expected data was not on our

production servers and wondered why we didn’t receive an e-mail to alert us

of a problem. Upon

investigation we discovered that we had received all the required files in

the proper format. The problem

was that the vendor had sent one or two of the files with less than half the

normal number of rows. Nothing

in the DTS package was designed to handle this problem since the few rows

received were correct and so the data pumps moved them to their respective

tables.

The fail safe I designed for this eventuality was to create an SQL Execute

Task that would count the number of rows in each table.

If the row count fell below a minimum level then the task would send an

e-mail to my cell phone and stop the job so that no more tasks were

executed. I placed this task

after the data pump tasks to the new tables and before the task that renames

all the tables to make the new data available on our web site.

Thus if there are not enough rows in any of the tables the new data will not

be made live.

Another problem we faced was when no new files were available the import DTS

package would still run and send an e-mail to tell us it had succeeded.

It had succeeded in importing yesterday’s data.

This was not acceptable. I

developed the following VBScript and placed it in an ActiveX Script task:

'**********************************************************************
'  Visual Basic ActiveX Script
'************************************************************************
 
'This task checks the file creation date for all  the text
'files in the zip file and if not today then it creates a directory
'and moves the file to that directory and then stops the
'DTS Package.
 
Function Main()
 
Dim TodayDate
Dim FullUpdateDate
Dim DailyUpdateDate
 
TodayDate = Date
 
Dim objFSO, objFolder, objFile, colFiles, objFSO2
 
Set objFSO2 = CreateObject("Scripting.FileSystemObject")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder = objFSO.GetFolder(DTSGlobalVariables("gvFilePath").Value)
Set colFiles = objFolder.Files
 
'delete folder WrongDate if exists
If objFSO.FolderExists(ObjFolder&"\WrongDate") Then
            objFSO.DeleteFolder(ObjFolder&"\WrongDate")
End If
 
'begin checking dates on LionShares text files and move old files to WrongDate folder
If colFiles.Count > 0 Then
            For Each objFile in colFiles
 
                                    'checks dates for all files
                        IF UCase(objFile.Name) = "DAILYUPDATE.TXT" or UCase(objFile.Name) = "FULLUPDATE.TXT" _
                        or UCase(objFile.Name) = "FUNDS.TXT" or UCase(objFile.Name) = "HOLDINGS.TXT" _
                        or UCase(objFile.Name) = "STOCKS.TXT" or UCase(objFile.Name) = "ANALYTICS.TXT" _
                        or UCase(objFile.Name) = "FUNDANALYTICS.TXT" Then
                                    DailyUpdateDate = TRIM(LEFT(objFile.DateLastModified, instr(objFile.DateLastModified, " ")))
 
                        'the below line is for debugging purposes.
                        'MsgBox "!" & DailyUpdateDate & "!" & TodayDate & "!"
 
                                    IF TRIM(TodayDate) <> DailyUpdateDate Then
 
                                               
 
                                                If objFSO.FolderExists(ObjFolder&"\WrongDate") Then
                                                            objFile.Move(objFolder&"\WrongDate\")
                                                Else
                                                            objFSO.CreateFolder(objFolder&"\WrongDate")
                                                            objFile.Move(objFolder&"\WrongDate\")
                                                End If
                                               
                                    End If
 
                        End If
 
            Next
End If
'end checking dates on LionShares zip files and move old files to WrongDate folder
 
Set objFSO = nothing
Set objFolder = nothing
Set colFiles = nothing
 
            Main = DTSTaskExecResult_Success
End Function

This script will compare the date of each file with today’s date and if the

date is not the same then the text file will be moved to a different

directory. Now we can just let

the DTS package run. If no

files are left where the data pump tasks can find them then the entire

package will halt and the process described in the section called

Avoiding False Failure Alerts will send out an e-mail to my cell phone

since the package will have logged a start time but no finish time.

The reason we will let the package continue at this point is because we

don’t know if we are receiving a daily or a full update.

Since we’ll only receive one or the other then the one that was not received

today will have the wrong date and thus get moved to the WrongDate folder

before the task executes that determines if we have a full or daily update.

Benefits

We now have a DTS package that will not adversely affect our customers and

that will let everyone who monitors them sleep knowing that if something

really goes wrong a wakeup e-mail will be sent to our cell phones and pagers

to alert us of the problem. We

now have fewer problems with the import and keep incomplete or bad data off

our web site. We also have

eliminated complaints that say there is no data to be seen in the areas of

our web site effected by this DTS package.

Summary

In this case study I have explained how to keep data available while new

data is being imported and some of the fail safes that help automate the

import process. I have also

described one method for handling full and daily updates in the same DTS

package while minimizing the number of failures recorded in error logs.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating