parse files in folder

  • I have a project where I need to search a folder and based upon passed in parms to the script do a lookup in each file for those values and copy that file to a new file.  I used the following to search a folder:

    Get-ChildItem -Path '\\srv1\archive\' | Select-String -Pattern "793383355"

    Now my requirement have changed.

    1) check 1st record in file does it start with ISA if so, then use the 1st passed in value to script to lookup TP then use 2nd passed in value to lookup on  Date.

    --If not found go get next file in folder.

    --  If found then read the 2nd record in file and look for "record type" which will be the 3rd passed in value to script.

    -- If found then copy the file to another folder \\srv1\search\captures

    -- If all 3 parms not located in file then skip to next file in folder.

    example script called:

    xxsrc_folder_search.ps1  -hdr "ISA" -date "21/03/10" -rectype "PO"

    It would be great if script prompted user for those 3 inputs values.  The -hdr and -date will always be on the first record in file

    if found. The -rectype will always be on the 2nd record in file if found

    any help would be greatly appreciated.

    Thanks,

     

     

  • In terms of getting parameters, I'd add a help to the script to note this, and then add parameters that return an error message if they aren't passed in.

    For the searching the file, Get-Content works. Then you can use a foreach to process the file and check lines. I've done something like this:

    foreach ($XmlFile in Get-ChildItem "$XmlFolder\*.xml") {
    $Xml = [xml](Get-Content $XmlFile -Encoding UTF8)
    $EventTitle = $Xml.GuideBookXML.guide.name
    write-host("Title: $EventTitle")
    }
  • To setup parameters in a script - you can use something like this:

    param (
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$hdr,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][datetime]$date,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$recordtype
    )

    For the loop - you can put the results of the Get-ChildItem into a variable and loop over that variable.  Then use the file name with Get-Content to get the first and second rows (Index 0 and 1):

    $fileList = Get-ChildItem ...;

    $fileList | % {
    $firstRow = Get-Content $_.FullFileName | Select-Object -Index 0;

    if ($firstRow.Substring(0,3) = "ISA") {
    if ($firstRow -contains $date.Date) {
    $secondRow = Get-Content $_.FullFileName | Select-Object -Index 1;

    if ($secondRow -contains $recordtype) {
    Copy-File ...;
    }
    }
    }
    }

    You will probably need to add a format to the $date to search for the date in a specific format - for example: $date.ToString('yyyy-MM-dd').

    I would see how the records are structured - are they delimited or fixed width and do the values you want to search for exist in a specific field (delimited) or location (fixed width)?  Based on your search criteria you could get invalid results unless you specifically search a field or location.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • My data is just text and the date format yymmdd(210310 sample) I don't have any delimiters in the file to be able to deliniate

    data records other than the ear marks to start the data rows(ISA). How can it scan the row looking for that date format?

    How to prompt user input for parms?

    Many thanks for comments and code samples...

  • You can use ReadHost, like this:

    $First = Read-Host -Prompt 'Input the TP lookup'

    For the rest of the question, you really need to provide some text input or some samples. We don't know what the file looks like, so trying to parse is just a guess. You can use Substring, as Jeffrey showed, to test for something on a line. If you are looking further inside the same line, you need to have a way to separate and read one value from another. Either a delimiter, fixed sizes, or something else.

  • Setting up the parameters as mandatory will also prompt the user.  Try it..

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I found more info on the file, for an Inbound record it will have this format, so maybe the delimiter is *

    I highlited  the 1st and 2nd parm lookups     --$hdr --$date

    ISA*00* *00* *01*038962429 *01*153370077SCC *210309*1553*

    This is what 2nd record for lookup looks like always starts with GS and 3rd parm lookup is highlited .  -- $recordtype

    GS*PO*038962429CSC*153370077SCC*

    Thanks.

     

  • The files appear to be EDI files - which might lead to a problem.  EDI files may not have CR\LF as record terminator - the EDI documentation will show the defined record terminator.  But if these files are as stated and you will always have CR\LF as a record terminator - then here is an example:

    First - I will create a temp file for testing:

    # Example of the contents of an EDI file
    $file = @("ISA*00* *00* *01*038962429 *01*153370077SCC *210309*1553*";
    , "GS*PO*038962429CSC*153370077SCC*"
    , "ST*850*1097~"
    , "SE*14*1097~"
    , "GE*1*1097~"
    , "IEA*1*100000097~");

    # Create a temp file
    $file | Out-File C:\Temp\temp_edi.txt;

    Now - we can use the following code to get the list of files and parse the results:

    param (
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$hdr,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$date,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$recordtype
    )

    # Get list of files
    $fileList = Get-ChildItem -Path C:\Temp | Where-Object {$_ -like "*edi*.txt"};

    $fileList | ForEach-Object {
    $fileName = $_.FullName;
    $firstRow = (Get-Content $fileName | Select-Object -Index 0).Split("*");

    if ($firstRow[0] -eq "ISA" -and $firstRow[6].Trim() -eq $hdr) {
    $secondRow = (Get-Content $fileName | Select-Object -Index 1).Split("*");

    if ($secondRow[1] -eq $recordtype) {
    Copy-Item -Path $fileName -Destination "C:\Temp\Archive\$($filename)" -WhatIf;
    }
    }
    }

    This is just a basic outline - but should get you started.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • thanks you are correct EDI related files... I'm trying to have a search script to find files in folders based upon the parms passed...

    I'll give it a go with the examples you provided and see if I can get something working..

    I may need to hit you back up if I run into some bumps...

    Thanks again for help..

     

     

  • SO there is a default format for inbound files

    Position 36 of the record contains 1st parm and has a length of 15

    Position 71 of record contains 2nd parm and in format YYMMDD

    The next record is the group envelope – this record has variable length fields and is designated by the first “GS” after position 104

    Position 2 after the “GS” contains the document type: The 3rd parm.

     

  • The above lookups are in the 1st record in file using criteria above. I need some help ...

    THanks.

  • Bruin wrote:

    The above lookups are in the 1st record in file using criteria above. I need some help ...

    THanks.

    What's stopping you from reading the first two rows using Bulk Insert and simply using substring to the check?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Bruin wrote:

    The above lookups are in the 1st record in file using criteria above. I need some help ...

    THanks.

    What's stopping you from reading the first two rows using Bulk Insert and simply using substring to the check?

    Why would you go to SQL Server to manipulate files?  These are not being loaded into SQL Server - the OP is just copying the files from one location to another based on the data in the files.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Bruin wrote:

    SO there is a default format for inbound files

    Position 36 of the record contains 1st parm and has a length of 15 Position 71 of record contains 2nd parm and in format YYMMDD

    The next record is the group envelope – this record has variable length fields and is designated by the first “GS” after position 104 Position 2 after the “GS” contains the document type: The 3rd parm.

    EDI files are delimited - the code I provided breaks out each record into an array based on the delimiter being a '*'.  The array is zero-based - so the first field will be array[0] and the 7th field would be array[6].

    Is there a problem with the code I provided?

    The only thing I can think of is that the records don't have a CR or LF or CR/LF as the record terminator.  If that is the case - then we need to know the record terminator so we can parse the file based on that record terminator.  Get-Content has a -Delimiter parameter that can be used to identify the record terminator and return a separate row for each record.

    For an EDI file - I would not try substring as the length of the fields is not fixed.  For example, field 3 could be a space - or no space - or have multiple characters.  The header field (field 7) may or may not be 10 characters (with a space at the end).

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    Jeff Moden wrote:

    Bruin wrote:

    The above lookups are in the 1st record in file using criteria above. I need some help ...

    THanks.

    What's stopping you from reading the first two rows using Bulk Insert and simply using substring to the check?

    Why would you go to SQL Server to manipulate files?  These are not being loaded into SQL Server - the OP is just copying the files from one location to another based on the data in the files.

    Jeffrey Williams wrote:

    Jeff Moden wrote:

    Bruin wrote:

    The above lookups are in the 1st record in file using criteria above. I need some help ...

    THanks.

    What's stopping you from reading the first two rows using Bulk Insert and simply using substring to the check?

    Why would you go to SQL Server to manipulate files?  These are not being loaded into SQL Server - the OP is just copying the files from one location to another based on the data in the files.

    Let me answer your question with other question... What are you doing for logging of the file receipt and copy/move (whatever) and what are you using to schedule this with and what privs does a particular user require to not only execute the power shell but have the necessary privs to directories that they probably shouldn't have individual or group access to?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 88 total)

You must be logged in to reply to this topic. Login to reply