parse files in folder

  • Using Invoke-SqlCmd you specify the server instance, the database, the authentication (defaults to windows authentication), the query and the variables.  Fixing up your SQL code - here is how that would look:

    $queryResults = Invoke-SqlCmd `
    -ServerInstance YourSQLInstance `
    -Database YourDatabase `
    -Query " Select Distinct
    da.Doc_BU
    , fl.FileLocation
    , fl.file_name
    From tblFileLocations fl
    Inner Join dbo.Document_Archive da On da.Doc_BU = fl.Bu
    And da.doc_parsed_filename = fl.file_name
    Where Doc_TP = '$(Sender)'
    And Doc_Date = '$(FileDate)'
    And Doc_Type = '$(RecordType)';" `
    -Variable "Sender=$Sender", "FileDate=$FileDate", "RecordType=$RecordType";

    Now - if you want to adjust the query, then create a variable that holds the query and add only those checks as needed:

    $sqlQuery = "Select Distinct
    da.Doc_BU
    , fl.FileLocation
    , fl.file_name
    From tblFileLocations fl
    Inner Join dbo.Document_Archive da On da.Doc_BU = fl.Bu
    And da.doc_parsed_filename = fl.file_name
    Where 1 = 1";

    if ($Sender -ne "") {
    $sqlQuery += "
    And Doc_TP = '$(Sender)'";
    }

    if ($FileDate -ne "") {
    $sqlQuery += "
    And Doc_Date = '$(FileDate)'";
    }

    if ($RecordType -ne "") {
    $sqlQuery += "
    And Doc_Type = '$(RecordType)'"
    }

    $sqlQuery += ";";

    $queryResults = Invoke-SqlCmd `
    -ServerInstance YourSQLInstance `
    -Database YourDatabase `
    -Query $sqlQuery
    -Variable "Sender=$Sender", "FileDate=$FileDate", "RecordType=$RecordType";

    There are other methods - for example, you could use a stored procedure - passing in the variables as is and in the procedure use a check like And (Doc_TP = @parm1 Or @parm1 = '')

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I used SP and called from PS script.

    $TP = Read-Host "Enter Trading Partner"

    $Date = Read-Host "Enter Date (YYMMDD)"

    $Doc = Read-Host "Enter Doc Type (2-Digits)"

    $QueryText = "exec dbo.usp_sp_get_filenames '88899', '200115','Sh'"

    $SqlConnection = new-object System.Data.SqlClient.SqlConnection

    $SqlConnection.ConnectionString = $connString

    $SqlCommand = $SqlConnection.CreateCommand()

    $SqlCommand.CommandText = "EXEC dbo.usp_sp_get_filenames $TP, $Date,$Doc"

    # Add parameters to pass values to the stored procedure

    $SqlCommand.Parameters.AddWithValue("@TP", $TP) | Out-Null

    $SqlCommand.Parameters.AddWithValue("@$Date", $Date) | Out-Null

    $SqlCommand.Parameters.AddWithValue("@$Doc", $Doc) | Out-Null

    $DataAdapter = new-object System.Data.SqlClient.SqlDataAdapter $SqlCommand

    $dataset = new-object System.Data.Dataset

    Write-Host $DataAdapter.Fill($dataset) ' records have been found.'

    The problem I found was it looks like I need to quote $TP because when I entered 00077I it gave me an error

     

    ALTER PROCEDURE [dbo].[usp_sp_get_filenames] 
    @TP nvarchar(30) = NULL,
    @Date nvarchar(30) = NULL,
    @DocType nvarchar(30) = NULL

    AS
    BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    -- Insert statements for procedure here

    Select distinct
    Document_Archive.Doc_BU,
    FileLocation,
    tblFileLocations.file_name
    from
    tblFileLocations,
    dbo.Document_Archive
    where
    tblFileLocations.Bu = Document_Archive.Doc_BU and
    rtrim(tblFileLocations.file_name) = rtrim(Document_Archive.doc_parsed_filename) and
    Doc_TP = @TP and Doc_Date = @Date and Doc_Type = @DocType
  • Not sure where you got that code - but it is not something I would recommend using.  There is no reason to build up the SQL connection when we have Invoke-SqlCmd available (if you don't have it - you can install it using: Install-Module SqlServer).

    Included in that code is a command text - which if you are going to use the connection setup that way is incorrect.  You want a command type of procedure and you specify the stored procedure name (not a string with exec).

    The next problem is using .AddWithValue - which has many problems associated with it.  If you are going to use that functionality - you want to add the parameter and specify the data type of the parameter.  By default, .AddWithValue sends a unicode string (nvarchar) and if your columns are varchar you will not get any use out of indexes - a full table scan will be performed.

    Your stored procedure also doesn't eliminate blanks.  To do that you would need:

    where
    tblFileLocations.Bu = Document_Archive.Doc_BU and
    rtrim(tblFileLocations.file_name) = rtrim(Document_Archive.doc_parsed_filename) and
    (Doc_TP = @TP Or @TP = '') and (Doc_Date = @Date Or @Date = '') and (Doc_Type = @DocType Or @DocType = '')

    I prefer the approach where we build the query string in PS to only include the parameters selected by the user.  This insures the best execution plan for that query each time it is executed.

    Note: you really need to change your query - remove the comma join syntax and join using the now standard [INNER] JOIN syntax.  You also don't need the RTRIM as SQL Server will ignore trailing spaces unless your collation and data type is one where that isn't done.  If you must use RTRIM - understand that it will avoid using any indexes and the query will take longer to execute.

    Also - you should get in the habit of aliasing your tables and using the table alias to reference *all* columns used in the query.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffery,

    You seem to know each component that I've been asking questions(SQL-EDI and very good with PowerShell). I have a question since the Interchange table in EDI basically has the information about delimiter's of the files, and name of the restored file. Is it possible to Import the .int files into SQl using powershell?  I'm curious if it can be done, that way you would have all data in SQL to do lookups rather than have to parse all the files on the file shares.

    Thanks..

  • I cannot answer that question - you already seem to have what is needed in SQL, already broken out by field/column.  It seems you have a process using an EDI translator that is loading the files - and if so, then you already have everything you need.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I was running into issue where in original PS was pulling back Sender,Date and RecordType(all working good), but in comparing to database records there an issue with dates. I wanted to modify the PS script to capture the Sender,Date and RecordType along with the filename and path to save those values into a SQL table. I can't figure out how to add the SQL Insert pieces from the script. I thought maybe run the values to a csv then import. can't get either to work.

    # Encoding check arrays
    [byte[]]$utf7 = 43,45;
    [byte[]]$unicode = 255,254;
    [byte[]]$utf8 = 239,187,191;


    # Get list of files
    $fileList = Get-ChildItem -Path e:\com_edi_archive_files\IntIn -Recurse | Where-Object {$_ -like "*.int"};

    $fileList | ForEach-Object {
    $unicodeOffset = 0;
    $fileName = $_.FullName;


    # Get the first 300 bytes from the file
    $bytes = Get-Content $fileName -Encoding byte -TotalCount 500 -ReadCount 500;

    # Check the encoding of the file - get $fileData based on encoding
    if (-not (Compare-Object $bytes[0..1] $unicode)) {
    $offset = 1
    #Write-Host 'Unicode encoded file identified';
    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);
    }
    elseif (-not (Compare-Object $bytes[0..2] $utf8)) {
    $offset = 1
    #Write-Host 'UTF-8 encoded file identified.';
    $fileData = [System.Text.Encoding]::Utf8.GetString($bytes);
    }
    # Note: this assumes the file is an EDI file where the first 3 characters are ISA
    # and the 4th character is a + and the 7th is a - (+ACo- = *, +AHw- = |, +AH4- = ~, ...)
    elseif (-not (Compare-Object $bytes[3] $utf7[0]) -and -not (Compare-Object $bytes[7] $utf7[1])) {
    $offset = 0
    #Write-Host 'UTF-7 encoded file identified.'
    $fileData = [System.Text.Encoding]::Utf7.GetString($bytes);
    }
    else {
    #Write-Host 'No encoding identified - using default Ascii';
    $fileData = [System.Text.Encoding]::Ascii.GetString($bytes);
    }

    # Validate this is an EDI file

    if ($fileData.Substring($offset,3) -eq "ISA") {

    # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(3+$offset,1);
    $segmentElement = $fileData.Substring(105+$offset,1);

    # Split first row based on segment and data element separators - Index = 0
    $firstRow = $fileData.Split($segmentElement)[0].Split($dataElement);

    # If we match the sender and the date - get the second row and check the record type
    #if (($firstRow[6].Trim() -eq $Sender -or $Sender -eq "") -and ($firstRow[9] -eq $FileDate -or $FileDate -eq "")) {



    # Get the second row based on the segment and data element separators - Index = 1
    $secondRow = $fileData.Split($segmentElement)[1].Split($dataElement);

    #if ($secondRow[1] -eq $RecordType -or $RecordType -eq "") {

    # Make a List of saved Variables

    $concatString = [System.String]::Concat($firstRow[6], $firstRow[9],$secondRow[1],$fileName)
    $concatString | Export-CSV "e:\temp\test.csv" -noType -append

    }

    }
  • This might get you started:

    # Encoding check arrays
    [byte[]]$utf7 = 43,45;
    [byte[]]$unicode = 255,254;
    [byte[]]$utf8 = 239,187,191;

    # Get list of files
    $fileList = Get-ChildItem -Path C:\Temp | Where-Object {$_ -like "*temp_edi_*.txt"};

    $fileList | ForEach-Object {
    $offset = 0;
    $fileName = $_.FullName; $fileName;

    # Get the first 300 bytes from the file
    $bytes = Get-Content $fileName -Encoding byte -TotalCount 500 -ReadCount 500;

    # Check the encoding of the file - get $fileData based on encoding
    if (-not (Compare-Object $bytes[0..1] $unicode)) {
    $offset = 1
    Write-Host 'Unicode encoded file identified';
    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);
    }
    elseif (-not (Compare-Object $bytes[0..2] $utf8)) {
    $offset = 1
    Write-Host 'UTF-8 encoded file identified.';
    $fileData = [System.Text.Encoding]::Utf8.GetString($bytes);
    }
    # Note: this assumes the file is an EDI file where the first 3 characters are ISA
    # and the 4th character is a + and the 7th is a - (+ACo- = *, +AHw- = |, +AH4- = ~, ...)
    elseif (-not (Compare-Object $bytes[3] $utf7[0]) -and -not (Compare-Object $bytes[7] $utf7[1])) {
    $offset = 0
    Write-Host 'UTF-7 encoded file identified.'
    $fileData = [System.Text.Encoding]::Utf7.GetString($bytes);
    }
    else {
    Write-Host 'No encoding identified - using default Ascii';
    $fileData = [System.Text.Encoding]::Ascii.GetString($bytes);
    }

    # Validate this is an EDI file
    if ($fileData.Substring($offset,3) -eq "ISA") {

    # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(3+$offset,1);
    $segmentElement = $fileData.Substring(105+$offset,1);

    # Split first row based on segment and data element separators - Index = 0
    $firstRow = $fileData.Split($segmentElement)[0].Split($dataElement);

    # Get the second row based on the segment and data element separators - Index = 1
    $secondRow = $fileData.Split($segmentElement)[1].Split($dataElement);

    $Sender = $firstRow[6].Trim();
    $FileDate = $firstRow[9];
    $RecordType = $secondRow[1];

    # Insert into database directly
    Invoke-Sqlcmd -ServerInstance DEMDBSVP-002.stjoe.org `
    -Database Reports `
    -Query "Insert Into test.MyTable (Sender, FileDate, RecordType)
    Values ('$Sender', '$FileDate', '$RecordType');" `
    -Variable "Sender=$Sender", "FileDate=$FileDate", "RecordType=$RecordType";

    # Create a CSV file for later insertion
    New-Object -TypeName PSCustomObject -Property @{
    Sender = $Sender
    FileDate = $FileDate
    RecordType = $RecordType
    } | Export-Csv -Path C:\Temp\OutputTest.csv -NoTypeInformation -Append;

    }
    }

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Cool.

    That Insert's as it reads the files correct?

    Thanks.

     

  • I ran into another file format for the EDIFACT in files... They all appear to be ASCII files

    sample:

    UNB+UNOA:1+UCH:ZZ+XXX:ZZ+YYYYYY:2200+12194++ZZZZZZ'UNH

    XXX - is Sender

    YYYYYY - Date

    ZZZZZZ - RecordType

    First record is always the UNB envelope:

    field after the 2nd plus sign contains the sending partner ID, terminated by a colon.

    field after the 4th plus sign contains the received date, format YYMMDD and terminated by a colon.

    field after the 7th plus sign contains the document type

    I tried to modify Code but having much luck trying to get these records into the SQL table.

    MANY THANKS FOR ALL YOUR HELP and REPLIES!!!

  • Bruin wrote:

    Cool.

    That Insert's as it reads the files correct? Thanks.

    I provided both - either load directly to SQL Server file by file, or create a CSV file and then load that file later (not included - could be loaded using BCP or some other utility).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Bruin wrote:

    I ran into another file format for the EDIFACT in files... They all appear to be ASCII files

    sample: UNB+UNOA:1+UCH:ZZ+XXX:ZZ+YYYYYY:2200+12194++ZZZZZZ'UNH

    XXX - is Sender YYYYYY - Date ZZZZZZ - RecordType

    First record is always the UNB envelope:

    field after the 2nd plus sign contains the sending partner ID, terminated by a colon. field after the 4th plus sign contains the received date, format YYMMDD and terminated by a colon. field after the 7th plus sign contains the document type

    I tried to modify Code but having much luck trying to get these records into the SQL table.

    MANY THANKS FOR ALL YOUR HELP and REPLIES!!!

    EDIFACT files are a different structure than EDI files.  And - to make matters worse it appears that your example file is formatted as UTF-7 (maybe?).  Before you can even begin to parse the file you need to determine the encoding - and then you can check for the type of file - and finally you can then parse the records based on position or delimiter.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I checked the encoding and all are ascii.

    The sample I was was just snippet of a file to show characteristics

    NB+UNOA:1+UCH:ZZ+XXX:ZZ+YYYYYY:2200+12194++ZZZZZZ'UNH

    XXX - is Sender

    YYYYYY - Date

    ZZZZZZ - RecordType

    First record is always the UNB envelope:

    field after the 2nd plus sign contains the sending partner ID, terminated by a colon.

    field after the 4th plus sign contains the received date, format YYMMDD and terminated by a colon.

    field after the 7th plus sign contains the document type

    The + sign seems to be delimter..

  • parse the records based on position:

    First record is always the UNB envelope:

    field after the 2nd plus sign contains the sending partner ID, terminated by a colon.

    field after the 4th plus sign contains the received date, format YYMMDD and terminated by a colon.

    field after the 7th plus sign contains the document type

    How do you check for + signs in the script..

    Thanks.

     

  • You will need a separate script for EDIFACT files - but using the idea's from the original script.  In the original script we are parsing the first and second row based on the row terminator (segment element separator), and then parsing each row based on the delimiter (data element separator).

    In that script - how did we define the segment separator and the data element separator?  What function was used to create the variables $firstRow and $secondRow using the definition of the segment/data element separators?

    For EDIFACT - if the data element separator (a + sign here) is always the separator, and the row terminator is always a CR/LF then you can read the first 2 rows of the file and hardcode the separator.  If those values are variable based on an external definition or defined based on the position in the header then you have to figure out how to parse that information from the file.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 14 posts - 76 through 88 (of 88 total)

You must be logged in to reply to this topic. Login to reply