Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Loading a Series of Flat Files

By Andy Jones,

Loading a series of flat files

 

Introduction

 

I recently faced a requirement to load a number of flat files to a SQL Server 2000 database on a nightly basis. The customer also wanted the ability to load a single file as an ad-hoc request. The files were placed within a directory on the database server by a third party piece of software. I have seen a number of questions to deal with similar tasks in threads on this site, here is how I solved this task using the extended stored procedures xp_DirTree, xp_FileExist and xp_CmdShell (which are available within the master database although the first two are undocumented in Books Online) together with a DTS package. My original solution included various error logging that I have omitted here for brevity.

 

Methodology

 

The first stored procedure is sp_LoadAll the purpose of which is to iterate through all files in a specified directory after first validating that the directory exists. For each file within the directory the stored procedure sp_LoadSingle is called to load the file. To schedule the loading of files on a nightly basis this stored procedure would be called from a SQL Server job. The methods to retrieve the files and validate the directory exists are outlined further on.

 

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[sp_LoadAll]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_LoadAll]

GO

 

-- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Name :                       sp_LoadAll

--

-- Parameters:               None

--

-- Returns :                    See returns from sub-procedures.

--

-- Description :               Loads all files in a specified directory.

-- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_LoadAll

 

AS

 

            set nocount on

 

            --Variables

            declare @rtn     int

            ,           @Folder            varchar(1000)    

            ,           @File    varchar(1000)

 

            --Assign defaults

            select @rtn = 0                          --Executed with no errors.

 

            --Assign the folder

            select @Folder = 'c:\Load\'

 

            --Does the folder exist

            exec

            @rtn     =          sp_FolderFileExist

            @File    =          @Folder

,           @Switch           =          0

 

            if @rtn != 0                    --Error

                        begin

                        goto QuitWithErrLog

                        end

 

            --Get the list of files

            exec

            @rtn     =          sp_GetFiles

            @Folder            =          @Folder

 

            if @rtn != 0                    --Error

                        begin

                        goto QuitWithErrLog

                        end

 

            --Load the files.

            declare cFiles cursor local fast_forward for select @Folder + [file] from ##Files order by [file]

            open cFiles

 

FetchNext:

 

            fetch next from cFiles into @File

 

            while @@Fetch_Status = 0

                        begin

                        exec

                        @rtn                 =          sp_LoadSingle

                        @File                =          @File

                        goto  FetchNext

                        end

 

            --Clean up cursor

            close cFiles

            deallocate cFiles

 

Quit:

 

            set nocount off

 

            return @rtn

 

QuitWithErrLog:

 

            --*** Log errors here ***

 

            goto Quit

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

 

The second stored procedure is sp_LoadSingle, this is called by sp_LoadAll for each file to load and it is also called when one file is required to be loaded as an ad-hoc task. This procedure firstly validates that the file to be loaded and the DTS package (stored as a structured storage file) exist.          

After this the data is ready to be loaded, I think the neatest way to do this is to firstly load the data into a holding table, perform all validation via TSQL queries and once successfully validated, load the data into the live table(s). The holding table is simply a series of varchar columns and should contain the data exactly as it appears within the flat file i.e. no validation or filtering should be performed by the DTS package its only purpose is to load the data into the holding table via a data pump task.

            The DTS package is called via the extended stored procedure xp_CmdShell and the command line utility Dtsrun (type Dtsrun /? At the command line for info.). It contains global variables for the filename to be loaded, the server name and database name, these are passed to the pacakge via switches in the Dtsrun command, see below.

            I have omitted the stored procedure sp_Validate as the content of this depends on the requirements. Some I faced were data being too long i.e. 6 characters in a field when only 5 were allowed, data type validation i.e. char data in numeric fields and a total record count – the header record of each file contains a total record count which had to match the actual record count. The customer also wanted a log (within a SQL server table) of the exact field within the file in which the error occurred together with the error. All this is achieved via TSQL statements within the stored procedure sp_Validate.

 

The definitions of the two tables used in this example are: -

 

 

            create table [dbo].[Holding]

            (

                        [c1]       varchar(4)

            ,           [c2]       varchar(4)

            ,           [c3]       varchar(4)

            ) on [primary]

            go

 

            create table [dbo].[Live]

            (

                        [c1]       int

            ,           [c2]       int

            ,           [c3]       int

            ) on [primary]

            go

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[sp_LoadSingle]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_LoadSingle]

GO

 

-- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Name :                       sp_LoadSingle

--

-- Parameters :              @File    varchar(1000)     --The full path and file name.

--

-- Returns :                    See returns from sub-procedures.

--

-- Description :               Loads a single file

-- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_LoadSingle

 

            @File                varchar(1000)                 --The full path and file name.

 

AS

 

            set nocount on

 

            --Variables

            declare @rtn     int         --Int to return

            ,           @DTSPath        varchar(1000)

            ,           @cmd   varchar(1000)

 

            select @rtn = 0                          --Executed with no errors.

 

            --Validate the file exists.

            exec

            @rtn     =          sp_FolderFileExist

            @File    =          @File

,           @Switch           =          1

 

            if @rtn != 0

                        begin

                        goto QuitWithErrLog

                        end

 

            --Assign the full path and filename of the ci.DTS package

            select @DTSPath = 'c:\DTS\Load.DTS'

 

            --Does CI.DTS exist on the file server?

            exec

            @rtn     =          sp_FolderFileExist

            @File    =          @DTSPath

,           @Switch           =          1

 

            if @rtn != 0

                        begin

                        goto QuitWithErrLog

                        end

 

            --Clear the holding table

            truncate table Holding

 

            --Load the file.

            select @cmd = 'Dtsrun /f "' + @DTSPath + '" /A File:8="' + @File + '" /A Server:8="' + @@servername + '" /A Database:8="' + db_name() + '" /W TRUE'

            exec master..xp_cmdshell @cmd

 

            --Validate the data

--          exec

--          @rtn     =          sp_Validate

--

--          if @rtn != 0

--                      begin

--                      goto QuitWithErrLog

--                      end

 

            --Load the data into live

            insert Live

            (

            [c1]

,           [c2]

,           [c3]

            )

            select

            [c1]

,           [c2]

,           [c3]

            from

            Holding

           

Quit:

 

            set nocount off

 

            return @rtn

 

QuitWithErrLog:

 

            --*** Log errors here ***

 

            goto Quit

 

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

Below is an overview of the DTS package used to load the data. There is a database connection and a flat file connection, a dynamic properties task assigns the global variables (file connection and the server and database for the database connection)  then a data pump task loads the data.

 

 

 

The stored procedure sp_GetFiles accepts a directory as a parameter and then populates the temporary table ##Files with all the files within that directory via the extended stored procedure xp_DirTree.

 

 

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[sp_GetFiles]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_GetFiles]

GO

 

-- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--

-- Name :                       sp_GetFiles

--

-- Parameters:               @Folder

--

-- Returns :                    -1 No files to process.

--

-- Description :               Populates ##Files with all files in a specified directory

-- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_GetFiles

 

            @Folder            varchar(1000)

 

AS

 

            --Variables

            declare @rtn     int

            ,           @rowcount        int

 

            select @rtn = 0              --Executed with no errors.

 

            --Does the folder exist

            exec

            @rtn     =          sp_FolderFileExist

            @File    =          @Folder

,           @Switch           =          0

 

            if @rtn != 0

                        begin

                        goto Quit

                        end

 

            --Drop the temp tables to remove table and data.

            if exists (select * from tempdb.dbo.sysobjects where name = '##DirTree')

                       

                        begin

                        drop table ##DirTree

                        end

 

            if exists (select * from tempdb.dbo.sysobjects where name = '##Files')

                       

                        begin

                        drop table ##Files

                        end

 

            --Create the temp tables.

            create table ##DirTree

            (

            [subdirectory]    varchar(255)

,           [depth]  int

,           [file]      int

            )

 

            create table ##Files

            (

            [File]     varchar(1000)

            )

 

            --Populate ##DirTree

            insert ##DirTree

            exec

            master..xp_dirtree

            @Folder

,           1

,           1

 

            --Populate ##Files         

            insert ##Files

            (

            [File]

            )

            select

            subdirectory

            from

            ##DirTree

            where

            [file]      =          1

 

            select @rowcount = @@rowcount

 

            if @rowcount = 0

 

                        begin

                        select @rtn = -1

                        end

 

            drop table ##DirTree

 

Quit:

 

            return @rtn

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

 

The stored procedure sp_FolderFileExist validates if a folder or a file exists on the database server via the extended stored procedure xp_FileExist. There are two parameters, @File which is the folder or file to validate and @Switch, 0 for a folder, 1 for a file.

 

 

 

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS OFF

GO

 

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[sp_FolderFileExist]') and OBJECTPROPERTY(id, N'IsProcedure') = 1)

drop procedure [dbo].[sp_FolderFileExist]

GO

 

-- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

-- Name :                       sp_FolderFileExist

--

-- Parameters :              @File, @Switch

--

-- Returns :                    -1 = Folder or file does not exist.

--

-- Description :               Does a folder or a file exist.

-- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

 

create procedure sp_FolderFileExist

 

            @File    varchar(1000)     --The folder or file name.

,           @Switch           bit         --0 = Folder; 1 = File

 

as

 

            --Variables

            declare @rtn      int

 

            select @rtn = 0              --Executed with no errors.

 

            --Create a temp table.

            create table #FolderFileExists

            (

            [File Exists]                   int

,           [File is a Directory]                     int

,           [Parent Directory Exists]            int

            )

           

            insert #FolderFileExists

exec

master..xp_FileExist

@File

 

            if not exists

            (

            select *

            from

            #FolderFileExists

            where

            case @Switch

                        when 0 then [File is a Directory]

                        when 1 then [File Exists]

            end = 1

            )

           

                        begin

                        select @rtn = -1

                        end

 

Quit:

 

            return @rtn

 

GO

SET QUOTED_IDENTIFIER OFF

GO

SET ANSI_NULLS ON

GO

 

 

Conclusion

 

There are a number of different ways to tackle this problem. A couple of which are: -

 

  • Iterate through the files using VB and the filesystemobject calling a DTS package via the COM interface to load the data.

  • Loop through the files within a DTS package calling another DTS package to load the data.

I liked this method because it keeps everything within the database (except for the DTS package)  making development and installation simpler. Has anyone tackled a similar problem in a different / better way, please let me know.

Total article views: 11044 | Views in the last 30 days: 14
 
Related Articles
FORUM

stored procedure is valid or not valid.

stored procedure is valid or not valid.

FORUM

File existance in folder

File existance in folder

FORUM

Check Files existence in a folder

Check Files existence in a folder

FORUM

Stored procedure- help with date parameter validation

Stored procedure- help with date parameter validation

FORUM

Using Script Task to check if Folder Exists

Using Script Task to check if Folder Exists

Tags
dts    
sql server 7    
stored procedures    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones