Importing New Data Without Emptying Existing Tables
The case study I am about to describe has evolved over the
past two years and is still changing as I learn new techniques.
Problem
Soon after I was given the position of SQL Programmer at
bigdough.com I was put in charge of all the DTS packages on our SQL
Servers. Over time I discovered the
following problems with our main DTS package which imports stock ownership data
into our databases. First, it
seemed to be designed to be run manually with no fail safes nor checks to verify
that everything ran correctly.
Second, it emptied the tables before pumping the new data into them every
night. These were problems as our
customer base outside the United States was growing and we would get complaints
from those customers that they couldn’t see any data. I remember distinctly getting a
complaint from a customer in Japan.
Also during this time period we got a new DBA who wanted all our DTS
packages to be as dynamic as possible to take into account changes in the
existing databases without having to change the DTS packages every time. When we switched data providers we had
to change our import process to handle a full update or a daily update depending
on what we received on a given day;
The previous provider simply provided a full update every night.
Business Challenge
The challenge for me was to devise a way to keep the data
available at all times while importing the new data, detect if a full or daily
update was received and run appropriate data pumps, put in sufficient fail safes
to ensure bad data would not get imported, and to make the process automatic
including notification to pagers upon failure.
Solution
Keeping The Data Available
The first thing I tackled was to keep the data available at
all times. I adjusted the DTS
package to create tables identical to the existing tables and added “_New” to
the end of the table name. This
piece had to be dynamic in the sense that if new indexes were added to the
original table the DTS package needed to be able to handle this without changing
the DTS package. I went to the
system tables and found where info about tables, primary keys, and indexes were
stored and created a stored procedure that would script tables with or without
primary key and indexes. The SP can
also script the primary key and/or the indexes without the table. The SP is available in the script
section and can be viewed by clicking on its name: <A HREF="http://www.sqlservercentral.com/scripts/contributions/543.asp”>
sp_CreateAndExecTableScript. </A>
With this SP I create most of the tables with all their
indexes except for the ones that have nearly a million rows. The data pump tasks go faster without
the indexes and the placement of the indexes after the data is in place takes
about 2.5 minutes.
After the data is imported into the new tables I run the
stored procedure sp_rename multiple times to rename the existing tables with the
extension “_old”. Then I rename all
the tables with the extension
“_New” to the original table name.
All these rename commands are encased by a BEGIN/COMMIT TRANSACTION with
a ROLLBACK being issued if one of the rename commands fails. This is an all or nothing transaction to
avoid having old data in one table live at the same time as new data in a
different table.
If this step fails a failure e-mail is immediately sent to my
cell phone which is always beside my bed.
The phone beeps until I look at the e-mail.
After testing this we have found that data is always
available. Whoever is logged on
during the import and/or swapping of tables will only experience a slight
slowness on the site. This slowness
was reduced when we upgraded from SQL Server 7.0 to SQL Server 2000.
Importing Full or Daily Update
When we changed data providers we had to change our DTS
package so it could detect if we received a full or daily update and make it run
the appropriate data pumps and other SQL tasks. All files associated with the daily
update have the word daily in front of their name. Those for a full update don’t. The DTS package has a data pump for each
file in a full update and each file in a daily update.
To control which data pumps get executed I created two SQL
Execute Tasks. One is called 1st
Gateway Daily Update and the other is called 1st Gateway Full Update. The only code in the task is a remark
stating the purpose of the task.
All data pumps for the daily update can’t execute until the SQL task
called 1st Gateway Daily Update succeeds.
All data pumps for the full update can’t execute until the SQL task
called 1st Gateway Full Update succeeds.
I then created an ActiveX Script Task with a VBScript to detect which
files were received. If the files
for a daily update are received then the VBScript marks the SQL Execute Task
called 1st Gateway Full Update as completed. This is interpreted by the DTS package
as a failure and so none of the data pumps for the full update are
executed. The task called 1st
Gateway Daily Update is marked as completed when a full update is received thus
preventing those data pumps from executing.
There are other tasks after the appropriate data pumps
execute that must run, however they can’t run until the needed data pumps
succeed. To make sure they would
run after the data pumps I created two more SQL Execute Tasks similar to those
described above and simply changed “1st” to “2nd”. The 2nd Gateway Daily Update
can’t execute until all the daily update data pumps succeed and the
2nd full update task can’t execute until the full update data pumps
succeed. All tasks that follow
can’t execute until both the 2nd gateway tasks have completed.
So, if a daily update is received the ActiveX Script Task
will mark both full update gateways as completed. This keeps all the full update data
pumps from executing and makes sure the tasks that must run after the daily data
pumps succeed wait for the data pumps.
Now the DTS Package can handle a full or daily update.
This method was used to avoid a failure entry for each data
pump that didn’t find a file. Half
of the data pumps would fail and be logged as failed in the error log. This method reduces the number of known
failures from about 10 to 2. One
for each task marked as completed.
Here is a shortened version of the VBScript that does what I
described in this section:
'**********************************************************************
' Visual Basic
ActiveX Script
'************************************************************************
Function Main()
' Specify the OLE DB provider.
set DATABASENAME = CreateObject("ADODB.Connection")
DATABASENAME.Provider = "sqloledb"
' Specify connection string on Open method.
ProvStr =
"Server=SERVERNAME;Database=DATABASENAME;UID=SQLSERVERLOGIN;PWD=PASSWORD"
DATABASENAME.Open provStr
'Create temporary table
query = "CREATE TABLE #fileexists (doesexist smallint,
fileisdir smallint, direxist smallint, txtfilename varchar(45) NULL default
'unknown')"
DATABASENAME.Execute(query)
query = "INSERT INTO #fileexists (doesexist, fileisdir,
direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\dailyupdate.txt'
UPDATE #fileexists SET txtfilename = 'dailyupdate.txt' WHERE txtfilename =
'unknown'"
DATABASENAME.Execute(query)
query = "INSERT INTO #fileexists (doesexist, fileisdir,
direxist) EXEC master..xp_fileexist 'D:\TextFilesForImport\fullupdate.txt'
UPDATE #fileexists SET txtfilename = 'fullupdate.txt' WHERE txtfilename =
'unknown'"
DATABASENAME.Execute(query)
'Start finding UpdateType
'1 indicates Full Update. 0 or 2 indicates at least one file
missing. 3 indicates Daily
Update.
UpdateTypeQuery = "SELECT doesexist AS UpdateType FROM
#fileexists WHERE txtfilename = 'fullupdate.txt'"
set LocalImportType =
DATABASENAME.execute(UpdateTypeQuery)
DTSGlobalvariables("UpdateType").value =
LocalImportType("UpdateType")
'enters this if when file name institution.txt is not
found
If DTSGlobalvariables("UpdateType").Value = 0 Then
UpdateTypeQuery = "SELECT doesexist + 2 AS UpdateType FROM #fileexists
WHERE txtfilename = 'dailyupdate.txt'"
set LocalImportType = DATABASENAME.execute(UpdateTypeQuery)
DTSGlobalvariables("UpdateType").value =
LocalImportType("UpdateType")
End If
'End finding UpdateType
query = "DROP TABLE #fileexists"
DATABASENAME.Execute(query)
Dim oPackage
Set oPackage = DTSGlobalVariables.Parent
'This if handles daily updates. It marks the Gateway Full Update steps
as completed.
If DTSGlobalvariables("UpdateType").Value = 3 Then
'Mark 1st Gateway Full Update step as completed
oPackage.Steps("DTSStep_DTSExecuteSQLTask_4").ExecutionStatus =
DTSStepExecStat_Completed
'Mark 2nd Gateway Full Update step as completed
oPackage.Steps("DTSStep_DTSActiveScriptTask_7").ExecutionStatus =
DTSStepExecStat_Completed
End If
'This if handles full updates. It marks the Gateway Daily
Update steps as completed.
If DTSGlobalvariables("UpdateType").Value = 1 Then
'Mark 1st Gateway Daily Update step as completed.
oPackage.Steps("DTSStep_DTSExecuteSQLTask_1").ExecutionStatus =
DTSStepExecStat_Completed
'Mark 2nd Gateway Daily Update step as completed.
oPackage.Steps("DTSStep_DTSActiveScriptTask_6").ExecutionStatus =
DTSStepExecStat_Completed
End If
Main = DTSTaskExecResult_Success
End Function
Avoiding False Failure Alerts
The above solution created a new problem. The job that executes the DTS package
will always complete reporting failure because of the tasks that get marked as
completed. To avoid this I modified
the job to always report success even upon failure. That avoids the false failure
alert. Unfortunately, the job will
no longer alert me when there really is a failure.
For that I created a table where the DTS package will log
when it started and when it finished.
Then I created a separate job that will count the number of times the job
logged a start for the day and how many times it logged a finish for the same
day. If the numbers aren’t the same
then an e-mail gets sent to my cell phone to wake me up and I groggily power up
my computer to investigate the problem.
Fail Safes
We discovered a problem one day when all the expected data
was not on our production servers and wondered why we didn’t receive an e-mail
to alert us of a problem. Upon
investigation we discovered that we had received all the required files in the
proper format. The problem was that
the vendor had sent one or two of the files with less than half the normal
number of rows. Nothing in the DTS
package was designed to handle this problem since the few rows received were
correct and so the data pumps moved them to their respective tables.
The fail safe I designed for this eventuality was to create
an SQL Execute Task that would count the number of rows in each table. If the row count fell below a minimum
level then the task would send an e-mail to my cell phone and stop the job so
that no more tasks were executed. I
placed this task after the data pump tasks to the new tables and before the task
that renames all the tables to make the new data available on our web site. Thus if there are not enough rows in any
of the tables the new data will not be made live.
Another problem we faced was when no new files were available
the import DTS package would still run and send an e-mail to tell us it had
succeeded. It had succeeded in
importing yesterday’s data. This
was not acceptable. I developed the
following VBScript and placed it in an ActiveX Script task:
'**********************************************************************
' Visual Basic
ActiveX Script
'************************************************************************
'This task checks the file creation date for all the text
'files in the zip file and if not today then it creates a
directory
'and moves the file to that directory and then stops the
'DTS Package.
Function Main()
Dim TodayDate
Dim FullUpdateDate
Dim DailyUpdateDate
TodayDate = Date
Dim objFSO, objFolder, objFile, colFiles, objFSO2
Set objFSO2 = CreateObject("Scripting.FileSystemObject")
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFolder =
objFSO.GetFolder(DTSGlobalVariables("gvFilePath").Value)
Set colFiles = objFolder.Files
'delete folder WrongDate if exists
If objFSO.FolderExists(ObjFolder&"\WrongDate") Then
objFSO.DeleteFolder(ObjFolder&"\WrongDate")
End If
'begin checking dates on LionShares text files and move old
files to WrongDate folder
If colFiles.Count > 0 Then
For Each objFile in colFiles
'checks dates for all files
IF UCase(objFile.Name) = "DAILYUPDATE.TXT" or UCase(objFile.Name) =
"FULLUPDATE.TXT" _
or UCase(objFile.Name) = "FUNDS.TXT" or UCase(objFile.Name) =
"HOLDINGS.TXT" _
or UCase(objFile.Name) = "STOCKS.TXT" or UCase(objFile.Name) =
"ANALYTICS.TXT" _
or UCase(objFile.Name) = "FUNDANALYTICS.TXT" Then
DailyUpdateDate = TRIM(LEFT(objFile.DateLastModified,
instr(objFile.DateLastModified, " ")))
'the below line is for debugging purposes.
'MsgBox "!" & DailyUpdateDate & "!" & TodayDate & "!"
IF TRIM(TodayDate) <> DailyUpdateDate Then
If objFSO.FolderExists(ObjFolder&"\WrongDate") Then
objFile.Move(objFolder&"\WrongDate\")
Else
objFSO.CreateFolder(objFolder&"\WrongDate")
objFile.Move(objFolder&"\WrongDate\")
End If
End If
End If
Next
End If
'end checking dates on LionShares zip files and move old
files to WrongDate folder
Set objFSO = nothing
Set objFolder = nothing
Set colFiles = nothing
Main = DTSTaskExecResult_Success
End Function
This script will compare the date of each file with today’s
date and if the date is not the same then the text file will be moved to a
different directory. Now we can
just let the DTS package run. If no
files are left where the data pump tasks can find them then the entire package
will halt and the process described in the section called Avoiding False
Failure Alerts will send out an e-mail to my cell phone since the package
will have logged a start time but no finish time. The reason we will let the package
continue at this point is because we don’t know if we are receiving a daily or a
full update. Since we’ll only
receive one or the other then the one that was not received today will have the
wrong date and thus get moved to the WrongDate folder before the task executes
that determines if we have a full or daily update.
Benefits
We now have a DTS package that will not adversely affect our
customers and that will let everyone who monitors them sleep knowing that if
something really goes wrong a wakeup e-mail will be sent to our cell phones and
pagers to alert us of the problem.
We now have fewer problems with the import and keep incomplete or bad
data off our web site. We also have
eliminated complaints that say there is no data to be seen in the areas of our
web site effected by this DTS package.
Summary
In this case study I have explained how to keep data
available while new data is being imported and some of the fail safes that help
automate the import process. I have
also described one method for handling full and daily updates in the same DTS
package while minimizing the number of failures recorded in error
logs.