SQLServerCentral Article

Loading a Series of Flat Files

,

Loading a

series of flat files

 

Introduction

 

I recently

faced a requirement to load a number of flat files to a SQL Server 2000 database

on a nightly basis. The customer also wanted the ability to load a single file

as an ad-hoc request. The files were placed within a directory on the database

server by a third party piece of software. I have seen a number of questions to

deal with similar tasks in threads on this site, here is how I solved this task

using the extended stored procedures xp_DirTree, xp_FileExist and xp_CmdShell

(which are available within the master database although the first two are

undocumented in Books Online) together with a DTS package. My original solution

included various error logging that I have omitted here for brevity.

 

Methodology

 

The first

stored procedure is sp_LoadAll the purpose of which is to iterate through all

files in a specified directory after first validating that the directory exists.

For each file within the directory the stored procedure sp_LoadSingle is called

to load the file. To schedule the loading of files on a nightly basis this

stored procedure would be called from a SQL Server job. The methods to retrieve

the files and validate the directory exists are outlined further on.

 

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from

dbo.sysobjects where id = object_id(N'[dbo].[sp_LoadAll]') and OBJECTPROPERTY(id,

N'IsProcedure') = 1)

drop procedure [dbo].[sp_LoadAll]

GO

 

--

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Name :                      

sp_LoadAll

--

-- Parameters:               None

--

-- Returns :                   

See returns from sub-procedures.

--

-- Description :              

Loads all files in a specified directory.

--

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_LoadAll

 

AS

 

            set nocount on

 

            --Variables

            declare @rtn     int

            ,          

@Folder            varchar(1000)    

            ,           @File   

varchar(1000)

 

            --Assign defaults

            select @rtn =

0                          --Executed with no errors.

 

            --Assign the folder

            select @Folder =

'c:\Load\'

 

            --Does the folder

exist

            exec

            @rtn     =         

sp_FolderFileExist

            @File    =         

@Folder

,           @Switch          

=          0

 

            if @rtn !=

0                    --Error

                        begin

                        goto

QuitWithErrLog

                        end

 

            --Get the list of

files

            exec

            @rtn     =         

sp_GetFiles

            @Folder           

=          @Folder

 

            if @rtn !=

0                    --Error

                        begin

                        goto

QuitWithErrLog

                        end

 

            --Load the files.

            declare cFiles cursor

local fast_forward for select @Folder + [file] from ##Files order by [file]

            open cFiles

 

FetchNext:

 

            fetch next from cFiles

into @File

 

            while @@Fetch_Status =

0

                        begin

                        exec

                        @rtn                

=          sp_LoadSingle

                       

@File                =          @File

                        goto 

FetchNext

                        end

 

            --Clean up cursor

            close cFiles

            deallocate cFiles

 

Quit:

 

            set nocount off

 

            return @rtn

 

QuitWithErrLog:

 

            --*** Log errors here

***

 

            goto Quit

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

 

The second stored procedure is

sp_LoadSingle, this is called by sp_LoadAll for each file to load and it is also

called when one file is required to be loaded as an ad-hoc task. This procedure

firstly validates that the file to be loaded and the DTS package (stored as a

structured storage file) exist.          

After this the data is ready to

be loaded, I think the neatest way to do this is to firstly load the data into a

holding table, perform all validation via TSQL queries and once successfully

validated, load the data into the live table(s). The holding table is simply a

series of varchar columns and should contain the data exactly as it appears

within the flat file i.e. no validation or filtering should be performed by the

DTS package its only purpose is to load the data into the holding table via a

data pump task.

            The DTS package is

called via the extended stored procedure xp_CmdShell and the command line

utility Dtsrun (type Dtsrun /? At the command line for info.). It contains

global variables for the filename to be loaded, the server name and database

name, these are passed to the pacakge via switches in the Dtsrun command, see

below.

            I have omitted the

stored procedure sp_Validate as the content of this depends on the requirements.

Some I faced were data being too long i.e. 6 characters in a field when only 5

were allowed, data type validation i.e. char data in numeric fields and a total

record count – the header record of each file contains a total record count

which had to match the actual record count. The customer also wanted a log

(within a SQL server table) of the exact field within the file in which the

error occurred together with the error. All this is achieved via TSQL statements

within the stored procedure sp_Validate.

 

The definitions of the two

tables used in this example are: -

 

 

           

create table [dbo].[Holding]

            (

                        [c1]       varchar(4)

           

,           [c2]       varchar(4)

           

,           [c3]       varchar(4)

            )

on [primary]

           

go

 

           

create table [dbo].[Live]

            (

                        [c1]       int

           

,           [c2]       int

           

,           [c3]       int

            )

on [primary]

           

go

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from

dbo.sysobjects where id = object_id(N'[dbo].[sp_LoadSingle]') and

OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_LoadSingle]

GO

 

--

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Name :                      

sp_LoadSingle

--

-- Parameters :             

@File    varchar(1000)     --The full path and file name.

--

-- Returns :                   

See returns from sub-procedures.

--

-- Description :              

Loads a single file

--

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_LoadSingle

 

            @File               

varchar(1000)                 --The full path and file name.

 

AS

 

            set nocount on

 

            --Variables

            declare @rtn     int        

--Int to return

            ,           @DTSPath       

varchar(1000)

            ,           @cmd  

varchar(1000)

 

            select @rtn =

0                          --Executed with no errors.

 

            --Validate the file

exists.

            exec

            @rtn     =         

sp_FolderFileExist

            @File    =         

@File

,           @Switch          

=          1

 

            if @rtn != 0

                        begin

                        goto

QuitWithErrLog

                        end

 

            --Assign the full path

and filename of the ci.DTS package

            select @DTSPath =

'c:\DTS\Load.DTS'

 

            --Does CI.DTS exist on

the file server?

            exec

            @rtn     =         

sp_FolderFileExist

            @File    =          @DTSPath

,           @Switch          

=          1

 

            if @rtn != 0

                        begin

                        goto

QuitWithErrLog

                        end

 

            --Clear the holding

table

            truncate table Holding

 

            --Load the file.

            select @cmd = 'Dtsrun

/f "' + @DTSPath + '" /A File:8="' + @File + '" /A Server:8="' + @@servername +

'" /A Database:8="' + db_name() + '" /W TRUE'

            exec

master..xp_cmdshell @cmd

 

            --Validate the data

--          exec

--          @rtn     =         

sp_Validate

--

--          if @rtn != 0

--                      begin

--                      goto

QuitWithErrLog

--                      end

 

            --Load the data into

live

            insert Live

            (

            [c1]

,           [c2]

,           [c3]

            )

            select

            [c1]

,           [c2]

,           [c3]

            from

            Holding

           

Quit:

 

            set nocount off

 

            return @rtn

 

QuitWithErrLog:

 

            --*** Log errors here

***

 

            goto Quit

 

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

Below is an

overview of the DTS package used to load the data. There is a database

connection and a flat file connection, a dynamic properties task assigns the

global variables (file connection and the server and database for the database

connection)  then a data pump task loads the data.

 

 

 

The stored

procedure sp_GetFiles accepts a directory as a parameter and then populates the

temporary table ##Files with all the files within that directory via the

extended stored procedure xp_DirTree.

 

 

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from

dbo.sysobjects where id = object_id(N'[dbo].[sp_GetFiles]') and

OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_GetFiles]

GO

 

--

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--

-- Name :                      

sp_GetFiles

--

-- Parameters:              

@Folder

--

-- Returns :                    -1

No files to process.

--

-- Description :              

Populates ##Files with all files in a specified directory

--

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_GetFiles

 

            @Folder           

varchar(1000)

 

AS

 

            --Variables

            declare @rtn     int

            ,           @rowcount

       int

 

            select @rtn =

0              --Executed with no errors.

 

            --Does the folder

exist

            exec

            @rtn     =         

sp_FolderFileExist

            @File    =         

@Folder

,           @Switch          

=          0

 

            if @rtn != 0

                        begin

                        goto Quit

                        end

 

            --Drop the temp tables

to remove table and data.

            if exists (select *

from tempdb.dbo.sysobjects where name = '##DirTree')

                       

                        begin

                        drop table

##DirTree

                        end

 

            if exists (select *

from tempdb.dbo.sysobjects where name = '##Files')

                       

                        begin

                        drop table

##Files

                        end

 

            --Create the temp

tables.

            create table ##DirTree

            (

            [subdirectory]   

varchar(255)

,           [depth]  int

,           [file]      int

            )

 

            create table ##Files

            (

            [File]    

varchar(1000)

            )

 

            --Populate ##DirTree

            insert ##DirTree

            exec

            master..xp_dirtree

            @Folder

,           1

,           1

 

            --Populate

##Files         

            insert ##Files

            (

            [File]

            )

            select

            subdirectory

            from

            ##DirTree

            where

            [file]      =         

1

 

            select @rowcount = @@rowcount

 

            if @rowcount = 0

 

                        begin

                        select @rtn

= -1

                        end

 

            drop table ##DirTree

 

Quit:

 

            return @rtn

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

 

The stored

procedure sp_FolderFileExist validates if a folder or a file exists on the

database server via the extended stored procedure xp_FileExist. There are two

parameters, @File which is the folder or file to validate and @Switch, 0 for a

folder, 1 for a file.

 

 

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from

dbo.sysobjects where id = object_id(N'[dbo].[sp_FolderFileExist]') and

OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_FolderFileExist]

GO

 

--

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Name :                      

sp_FolderFileExist

--

-- Parameters :             

@File, @Switch

--

-- Returns :                    -1

= Folder or file does not exist.

--

-- Description :              

Does a folder or a file exist.

--

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure

sp_FolderFileExist

 

            @File   

varchar(1000)     --The folder or file name.

,           @Switch          

bit         --0 = Folder; 1 = File

 

as

 

            --Variables

            declare @rtn      int

 

            select @rtn =

0              --Executed with no errors.

 

            --Create a temp table.

            create table #FolderFileExists

            (

            [File

Exists]                   int

,           [File is a

Directory]                     int

,           [Parent Directory

Exists]            int

            )

           

            insert #FolderFileExists

exec

master..xp_FileExist

@File

 

            if not exists

            (

            select *

            from

            #FolderFileExists

            where

            case @Switch

                        when 0

then [File is a Directory]

                        when 1

then [File Exists]

            end = 1

            )

           

                        begin

                        select @rtn

= -1

                        end

 

Quit:

 

            return @rtn

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

 

Conclusion

 

There are a

number of different ways to tackle this problem. A couple of which are: -

 

  • Iterate

    through the files using VB and the filesystemobject calling a DTS package via

    the COM interface to load the data.

  • Loop

    through the files within a DTS package calling another DTS package to load the

    data.

I liked this

method because it keeps everything within the database (except for the DTS

package)  making development and installation simpler. Has anyone tackled a

similar problem in a different / better way, please let me know.

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating