Import Multiple txt files and append to 1 table

  • Hello All:

    I'm trying to find the best way to import multiple text files and append them to a table.  I know I can set up a DTS package to import one text file, but I'm not sure how to do it with multiple text files.

    The text files will always have the same names.  Or I could loop through them.  I would like to do this using DTS b/c it is a job I would like to schedule. 

    Does anyone have a sample of code to perform this task?  Am I going in the right direction?

    Any help is greatly appreciated.

    Cleech

  • Sure thing.  Have a look at this article:

    http://www.sqldts.com/default.aspx?246

    (This is much, much easier in 2005 SSIS if you have access to it.)

  • Do all of the files have the same layout?  If so, set up one data transformation task and disable it (right-click on transformation, select Workflow Properties, and click the "Disable this step" box on the Options tab).  Populate a recordset or array that contains all the file paths and then loop through it, updating the source file object's data source and execute the transformation.

     

    Here's an example of the ActiveX script task to do it.  My source file connection object is named "SourceFile" and I've populated a global variable named "FileList" with a list of source files:

     

     Dim objPackage

     Dim objRS

     Dim objFSO

     Dim objFile

     Dim objSourceFile

     Dim objTransformation

     Dim strFileName

     Dim strProcessStartTime

     

     Set objPackage = DTSGlobalVariables.Parent

     Set objSourceFile = objPackage.Connections("SourceFile")

     Set objTransformation = objPackage.Steps("DTSStep_DTSDataPumpTask_1")

     '  loop through FileList recordset.  Use the variable value to check for a file's existance.

     '  If the file exists and it contains data, set the source of text file object to the file's path and execute the transformation.

     Set objRS = DTSGlobalVariables("FileList").Value

     With objRS

      Do Until .EOF

       strFileName = .Fields(1).Value

       If objFSO.FileExists(strFileName) Then

        Set objFile = objFSO.GetFile(strFileName)

        If objFile.Size > 0 Then

         objSourceFile.DataSource = strFileName

         objTransformation.Execute

        End If

       End If

      

       .MoveNext

      Loop

      .Close

     End With

     '  set FileList global variable's value to "" to eliminate the ADODB error 'Operation is not allowed when the object is closed' when trying to save package.

     DTSGlobalVariables("FileList").Value = ""

     Set objTransformation = Nothing

     Set objSourceFile = Nothing

     Set objFile = Nothing

     Set objFSO = Nothing

     Set objRS = Nothing

     Set objPackage = Nothing

     Main = DTSTaskExecResult_Success

  • example of how doing this within tsql:

    CREATE TABLE #Temp1 (

    Field1IntNULL,

    Field2 VarChar(64)NULL)

    BULK INSERT #Temp1

    FROM '\\server\share\folder\filename1.csv'

    WITH

    (

    FIELDTERMINATOR = ',',

    FIRSTROW = 2

    )

    BULK INSERT #Temp1

    FROM '\\server\share\folder\filename2.csv'

    WITH

    (

    FIELDTERMINATOR = ',',

    FIRSTROW = 2

    )

    ... repeat the BULK Insert

    can go directly into your destination table too

  • We've used the technique suggested by Pam and it works out well so long as the files are all in the same format. Claude's approach would work as well, again so long as the files are in the same format. If they are not in the same format, Erik's suggestion would work out, provided you always have the same files.

  • you can load data this way and then test to figure out the format and then load it into the final table

    or, name the files to include some kind of format indicator like filename + A, B, C, etc...

    the use xp_cmdshell to get a dir listings and than load them by what ever routine is need for the format.

  • Thank you all for the help so far.

    Pam:

    I am using the package that you posted.  The  code runs, but it only imports one file from the directory.  The only error I get is in the loop around step.  The error I get is "file already exists."

    I am little lost on the full file name and how the loop starts too.

    Thanks for the help.  LMK if you would like me to email the DTS file.

    Thanks,

    Joe

  • I have a similar situation.  Several times a year I have been manually creating separate tables for each of my text files located in directories.  Each text file is created and sent in by our customers based on the software they are running.  So I will need two separate DTS packages to automate this process.  The problem for me is that there are over 700 text files each time to manually import using the import/export wizard to create individual tables.  Each file contains a different number of records which shouldn't cause an issue.  When I bring the files in manually I import them using the File Type of ANSI, {LF} of Row Delimiter, <none> for the Text Qualifier, and a TAB column delimiter.  Sometimes the files end up being imported with 5 columns and other times there ends up being as many as 9 columns.  I truly only need the first 4 columns. 

    I have been doing a lot of research on getting this automated because it takes roughly two days for me to individually import these files and as our customer base continues growing, so too will the number of files that I will have to create.  In my research I have seen several different examples of looping through directories to import files into a table.  However, I need each file to be imported into it's own table that is dropped and created during this process.  I have read a number of articles briefly explaining how to manipulate the DTS package to possibly accomplish this.  I am not too familiar with the actual tasks being used in the packages.

    Basically what I have gotten out of my research is this:

    I have an ActiveX script task that gets the first file.  Then I have an execute SQL task that is a basic select statement using a parameter which is the name of the first file.  Next I have another ActiveX script task that is used to reset the execute SQL statement task to a different query.  I need to drop table ? if it exists.  As I read, there is a bug in SQL that will not allow you to use parameters as tablenames, etc. and this was the work around given.  My fourth step is another ExecuteSQL task that, again, runs a basic select and then the following step (another ActiveX script) resets this Execute SQL task to have a statement that creates table ? based on the file name/parameter.  I then have my source and destination connections linked via a Transform Data task property.  Finally, the last step is to loop to get the next file in the directory.

    The first two steps in this DTS package run.  The rest, although they do not error out, end with a status of Not Run.  If anybody has any thoughts as to if I am on the right track or what the problem may be that's causing the other steps not to run that would be great.  Let me know if you need all of the coding to be posted as I can do that as well.  I just cannot see wasting days of time in individually importing each of these files into a SLQ database.  So, any suggestions would be greatly appreciated!

  • Sarah, Sounds like you are close to being on the right track. You have the right idea. What I would suggest is setting the SQL for the ExecuteSQL tasks using an ActiveX script.  In addition, if the files are the same format, or a known x number of formats, you can use a format file and bulk insert to load your files.

    Example:

    <snip>

    Dim sSQL

    sSQL = "DROP TABLE " & tbl

    '** Assign the SQL

     Set oSQLTask = oPkg.Tasks("DTSTask_DTSExecuteSQLTask_1").CustomTask

     oSQLTask.SQLStatement = sSQL

    '** Setup the file location, format file location, destination table, and starting line

     Set oBulkInsertTask = oPkg.Tasks("DTSTask_DTSBulkInsertTask_1").CustomTask

     oBulkInsertTask.DataFile = DTSGlobalVariables("DataFile").Value

     oBulkInsertTask.DestinationTableName = sDestinationTable

     oBulkInsertTask.FirstRow = DTSGlobalVariables("FirstRow").Value

     oBulkInsertTask.LastRow = DTSGlobalVariables("LastRow").Value

     oBulkInsertTask.FormatFile = DTSGlobalVariables("FormatFile").Value

    </snip>

     

    This is a code snippet that I use in a production environment to handle multiple files. I have wanted to write an article on this system, but need company approval. =)

  • Thanks for the help.  I would like a little clarification, if possible. 

    "What I would suggest is setting the SQL for the ExecuteSQL tasks using an ActiveX script."

    I am currently using ActiveX scripts to set the SQL for the ExecuteSQL tasks.  (I left out a few of the steps for simplicity sake.)

    ExecuteSQL Task1 -->    Drop Table ActiveX (reset SQL in ExecuteSQL1)       --> ExecuteSQL Task2 --> Create Table ActiveX (reset SQL in ExecuteSQL2)

    Is this still correct?  Or should the ActiveX scripts run prior to the ExecuteSQL tasks?  Or should the ActiveX scripts be combined into one step?  I assume I still need to have separate ExecuteSQL tasks, too.

    Thanks for clarifying for me!  I am so glad to hear that I am close as I was starting to wonder if it was all possible.

     

  • You should run the activex tasks first, then the executesql tasks. Generally, what I like to do to keep things organized, is to have an activex task for setting the sql just before the executesql task so it is easy to identify what is setting the code. The other option is to have a single activeX task that sets all your steps as the first task of the job then flow through as you normally would.

  • I am attempting to modify the looping and archiving code to import several excel files each with a different name. One file goes in just fine then it hangs on the archive (fso code).

    So I disable the line and move the file manually and try to continue looping through. The global variable is taken by the source (excel file) but the transformation is still looking for the file that I have moved and fails as a result. I have replaced the source, transformation and the destination, perhaps I have messed something up in doing this.

    Anyone have any suggestions? Big time rookie on this end so use little words and talk slow :).

  • Can you post some code samples for us?

  • The code below is my first issue please see comments as to how I have picked at it.

    ' 246 (Loop Around)

    Option Explicit

    Function Main()

    dim pkg

    dim stpbegin

    dim fso

    dim fil

    dim fold

    set pkg = DTSGlobalVariables.Parent

    set stpbegin = pkg.Steps("DTSStep_DTSActiveScriptTask_3")

    set fso = CREATEOBJECT("Scripting.FileSystemObject")

    ' set the step at the start of the loop to an execution status of waiting

    stpbegin.ExecutionStatus = DTSStepExecStat_Waiting

    ' archiving FileSystemObject to move the file to another directory

    'debug good to here

    'added below to check variables they seem to be ok

    MSGBOX DTSGlobalVariables("gv_FileFullName").Value

    MSGBOX DTSGlobalVariables("gv_ArchiveLocation").Value

    fso.MoveFile DTSGlobalVariables("gv_FileFullName").Value ,DTSGlobalVariables("gv_ArchiveLocation").Value

    'text box below never shows, hangs on the line above

    MSGBOX "test"

    'at first I thought the file was tied up but released the connection to the file by checking the close connection after exe. under the workflow properties

    Main = DTSTaskExecResult_Success

    End Function

    The code below changes the connection properties to the data source (excel file) but the data transformation is still looking for the previous file, since I have moved it to the archive manually the step fails.

    ' 246 (Begin Loop)

    Option Explicit

    Function Main()

    dim pkg

    dim conXLFile

    dim stpEnterLoop

    dim stpFinished

    set pkg = DTSGlobalVariables.Parent

    set stpEnterLoop = pkg.Steps("DTSStep_DTSDataPumpTask_1")

    set stpFinished = pkg.Steps("DTSStep_DTSActiveScriptTask_5")

    set conXLFile = pkg.Connections("Microsoft Excel 97-2000")

    ' We want to continue with the loop only of there are more

    ' than 1 text file in the directory. If the function ShouldILoop

    ' returns true then we disable the step that takes us out of the package

    ' and continue processing

    if ShouldILoop = True then

    stpEnterLoop.DisableStep = False

    stpFinished.DisableStep = True

    conXLFile.DataSource = DTSGlobalVariables("gv_FileFullName").Value

    stpEnterLoop.ExecutionStatus = DTSStepExecStat_Waiting

    else

    stpEnterLoop.DisableStep =True

    stpFinished.DisableStep = False

    stpFinished.ExecutionStatus = DTSStepExecStat_Waiting

    End if

    Main = DTSTaskExecResult_Success

    End Function

    Function ShouldILoop

    dim fso

    dim fil

    dim fold

    dim pkg

    dim counter

    set pkg = DTSGlobalVariables.Parent

    set fso = CREATEOBJECT("Scripting.FileSystemObject")

    set fold = fso.GetFolder(DTSGlobalVariables("gv_FileLocation").Value)

    counter = fold.files.count

    'So long as there is more than 1 file carry on

    if counter >= 1 then

    for each fil in fold.Files

    DTSGlobalVariables("gv_FileFullName").Value = fil.path

    ShouldILoop = CBool(True)

    Next

    else

    ShouldILoop = CBool(False)

    End if

    End Function

    Hope this was not to much. Thanks for taking a look at it!

  • When you get the pop up are you getting the expected file name? (The one before the move?)

    In your "shouldloop" function, is it getting called properly?

    Rather than this:

    for each fil in fold.Files

    DTSGlobalVariables("gv_FileFullName").Value = fil.path

    ShouldILoop = CBool(True)

    Next

    I would do this:

    DTSGlobalVariables("gv_FileFullName").Value = fold.Files(1).path

    The difference is in the first method, you are looping through every file in the directory and then taking the last one where with the second technique you are explicitly saying "give me the first file in the list" and will save time if you have a large directory.

    Are you sure you are getting a "true" response from the shouldloop function?

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply