parse files in folder

  • As I pointed out in my other posts - that check for 'unicode' is not the best option.  The best option is to check for the specific BOM characters for each encoding - and then convert 'GetString' the data from the retrieved bytes.

    To compare the bytes - use Compare-Object as I have outlined above and specifically look for those encodings that have a defined BOM.  At a minimum, check for Unicode and UTF-8 and default to Ascii - that will be much safer in the long run.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • The records I have in Inbound won't change, and it made it all the way thru the sub-folders without hitting any errors, I'm going to try and run it with a search against the outbound folder now and see if it has any trouble. It seems the X12 standard files with current

    script seems okay but maybe the EDIFACT is another story...How can I add a file counter and maybe a "Searching Message" to

    let user know something happening and not stalled.

    Thanks for your posts and suggestions..

  • I provided examples of how to work with the BOM and use that - I recommend (strongly) that you do so - instead of the earlier hack I provided to just check for unicode.

    You can add anything you need...it is simply a matter of writing output to the host or to the currently defined output.  I already showed how to output to the host using Write-Host in this line:

    # Check for at least one parameter selected
    if ($Sender -eq "" -and $FileDate -eq "" -and $RecordType -eq "") {
    Write-Host -ForegroundColor Yellow "At least one parameter must be selected. Please try again.";
    Exit;
    }

    You can also use Write-Output instead - which would output to the currently defined output (eg: stdout).  To form the message - you can use concatenation or my preference: "Searching Message: $($fileName)"

    The $(var) syntax tells PS to evaluate the variable and substitute that value.

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • You where correct about encoding of files it hit other files in folders and gave error messages. I know you gave examples of how to check, but not sure I know how to insert that into the logic into code. Will that logic checking on each file really slow down the process?

    Thanks...

  • If it hits a file that doesn't match criteria check, instead of dropping a message out to ISE could it just log that file, and continue

    searching without throw error's to screen...

    I really appreciate your responses and comments...

  • I am not sure what you expect - I have provided more than enough examples to get you what you need.

    From my previous post:

    Jeffrey Williams wrote:

    Here are some hints:

    # Encoding check arrays

    [byte[]]$utf7 = 43,45;

    [byte[]]$unicode = 255,254;

    [byte[]]$utf8 = 239,187,191;

    if (-not (Compare-Object $bytes[0..1] $unicode)) {

    $offset = 1

    Write-Host 'Unicode encoded file identified';

    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);

    }

    This can be extended using:

    if (condition) {
    <code here>
    }
    elseif (condition) {
    <code here>
    }
    elseif (condition) {
    <code here>
    }
    else {
    <default code here>
    }

    If you want to write to the host running the process use Write-Host.  If you want to write to an output file - there are several methods available: Out-File, Export-Csv, redirect stdout (then use Write-Output)

    Note: to check for UTF-7 files we need to assume the file is an EDI file and that the first 3 characters are ISA.  If we assume that - then we know what the 4th and 7th characters will be if the file is encoded with UTF-7 (see previous posts).  To perform a check you need to use 2 compare-objects statements - one for the first character check and one for the second character check.

    Try to put this together - and if you run into problems, post the code you are running and where you are having issues.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Okay so I dumped my search folders to a file using out-file, and found that I have 3 different types of encoding.

    most are ascii, and I have utf8, and there are some that are blank(which are corrupted files - can be skipped)..

    Now I need some help to call this function and based upon output(encoding) go search file for the parms entered..

    function Get-FileEncodingv2($Path) {
    $bytes = [byte[]](Get-Content $Path -Encoding byte -ReadCount 300 -TotalCount 300)

    if(!$bytes) { return 'utf8' }

    switch -regex ('{0:x2}{1:x2}{2:x2}{3:x2}' -f $bytes[0],$bytes[1],$bytes[2],$bytes[3]) {
    '^efbbbf' { return 'utf8' }
    '^2b2f76' { return 'utf7' }
    '^fffe' { return 'unicode' }
    '^feff' { return 'bigendianunicode' }
    '^0000feff' { return 'utf32' }
    default { return 'ascii' }
    }
    }

    dir -Path 'C:\temp\EDI\Temp' -File |
    select Name,@{Name='Encoding';Expression={Get-FileEncodingv2 $_.FullName}} |
    ft -AutoSize

    Thanks.

     

  • In Powershell - you can create a function at the top of the script, then call it later - or you can put a function in a separate file and include that file in your script - or you can create it as a module and import the module...or other methods.  However, that isn't needed here - you just need to check the BOM after reading the first 500 bytes of data:

    First - define the byte arrays for the BOM's:

    # Encoding check arrays
    [byte[]]$utf7 = 43,45;
    [byte[]]$unicode = 255,254;
    [byte[]]$utf8 = 239,187,191;

    Then - read the first 500 bytes, determine the encoding and decode based on that encoding:

        # Get the first 300 bytes from the file
    $bytes = Get-Content $fileName -Encoding byte -TotalCount 500 -ReadCount 500;

    # Check the encoding of the file - get $fileData based on encoding
    if (-not (Compare-Object $bytes[0..1] $unicode)) {
    $offset = 1
    Write-Host 'Unicode encoded file identified';
    $fileData = [System.Text.Encoding]::Unicode.GetString($bytes);
    }
    elseif (-not (Compare-Object $bytes[0..2] $utf8)) {
    $offset = 1
    Write-Host 'UTF-8 encoded file identified.';
    $fileData = [System.Text.Encoding]::Utf8.GetString($bytes);
    }
    # Note: this assumes the file is an EDI file where the first 3 characters are ISA
    # and the 4th character is a + and the 7th is a - (+ACo- = *, +AHw- = |, +AH4- = ~, ...)
    elseif (-not (Compare-Object $bytes[3] $utf7[0]) -and -not (Compare-Object $bytes[7] $utf7[1])) {
    $offset = 0
    Write-Host 'UTF-7 encoded file identified.'
    $fileData = [System.Text.Encoding]::Utf7.GetString($bytes);
    }
    else {
    Write-Host 'No encoding identified - using default Ascii';
    $fileData = [System.Text.Encoding]::Ascii.GetString($bytes);
    }

    And finally - parse the records:

        # Validate this is an EDI file
    if ($fileData.Substring($offset,3) -eq "ISA") {

    # Get the data element separator and segment element separator
    $dataElement = $fileData.Substring(3+$offset,1);
    $segmentElement = $fileData.Substring(105+$offset,1);

    # Split first row based on segment and data element separators - Index = 0
    $firstRow = $fileData.Split($segmentElement)[0].Split($dataElement);

    # If we match the sender and the date - get the second row and check the record type
    if (($firstRow[6].Trim() -eq $Sender -or $Sender -eq "") -and ($firstRow[9] -eq $FileDate -or $FileDate -eq "")) {

    # Get the second row based on the segment and data element separators - Index = 1
    $secondRow = $fileData.Split($segmentElement)[1].Split($dataElement);

    if ($secondRow[1] -eq $RecordType -or $RecordType -eq "") {

    # Copy the file to the new location
    Copy-Item -Path $fileName -Destination "C:\Temp\Archive\$($filename)" -WhatIf;
    }
    }
    }

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Very cool!!!  I'm going to run it thru the folders and see how it works.

    Thanks again, and will report back results...

  • I just needed to include an exclusion for the corrupt files, but other than that it worked great...

    I'm going to add a Progress-Bar to keep user informed of the search... and maybe a counter for files found..

    Thanks for ALL the help suggestions it's great when someone extends there scripting skills to someone trying to learn ..

  • Happy to help - glad to see you have something that is working now.

    I know this won't be extremely fast but it is workable.  To get something much faster you would need to change the approach - but that wouldn't be too difficult.  You could use this script as a starting point and instead of copying the files at this point, update a table in a database with the key elements - schedule the script to run once a day (for example) - and make sure you have indexes on the key columns.

    A second script could then be created to execute a query based on the user parameters - which returns a list of matching values from the table and that script would copy the files.

    A possible third script would be a cleanup script - something that runs (as needed or scheduled) that validates all entries in the database.  If the entry in the database no longer exists in the file system - delete from the database.

    The first script would then search the folders - filtered by last write time - and just add new entries.  Or - the first script could rebuild the table each time it runs (eliminating the requirement for a third script).

    Many options - but at least you now have something that meets the requirements.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • SO I created a table in SQL that has many of the key fields

    doc_bu

    doc_tp

    doc_filename

    doc_date

    doc_type

    The doc_filename has the file name that the PS script found, and the pointer on disk. How can I strip out just the file name

    and do a folder search for that specific file.

    example:

    restored\or1998873csh.int

    I just want the or1998873csh.int to pass to a search it will always be .int as the 2nd part and I need it to search backwards until it finds

    the slash(\)

    restored\or1998873csh.int

    How could I use the results of that to find the real folder on disk. The example I'm using is a pointer not the true

    directory\drive where the file resides.

    I was thinking that might save opening up each file for READ..

    Thanks.

     

  • Bruin wrote:

    SO I created a table in SQL that has many of the key fields

    doc_bu doc_tp doc_filename doc_date doc_type

    The doc_filename has the file name that the PS script found, and the pointer on disk. How can I strip out just the file name and do a folder search for that specific file.

    example: restored\or1998873csh.int

    I just want the or1998873csh.int to pass to a search it will always be .int as the 2nd part and I need it to search backwards until it finds the slash(\) restored\or1998873csh.int

    How could I use the results of that to find the real folder on disk. The example I'm using is a pointer not the true directory\drive where the file resides.

    I was thinking that might save opening up each file for READ..

    Thanks.

    How are you updating the table?  In Powershell - you can get just the file name using $_.BaseName (returns the file name without extension).  Combine that with $_.Extension to get the file name and extension.  Ex: "$($_.BaseName).$($_.Extension)"

    You can get the folder using $_.DirectoryName or $_.Directory - store that in the table also...or, store $_.FullName to get the full path and name where the file exists.

    Once you have everything in a table - use Invoke-SqlCmd to execute a query and return the results into a variable.  As long as you have the path and file name as a column being returned you can reference that value in a foreach using $_.ColumnNameFromSql

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • The table was updated from a SQL script that read interchange table to pull information. Now I want to strip the filename from

    table and pass that to PS script to go search folders and copy file to an archive.

    Thanks

  • How can I take the Parms from script and connect to SQL and run a query using the Parms from above?

     

    # Check for at least one parameter selected

    if ($Sender -eq "" -and $FileDate -eq "" -and $RecordType -eq "") {

    Write-Host -ForegroundColor Yellow "At least one parameter must be selected.  Please try again.";

    Exit;

    }

     

    Doc_TP = $Sender and Doc_Date = $FileDate and Doc_Type = $RecordType.. and in query make sure at least 1 of the 3 parms

    are populated.

     

    Thanks..

    SQL Query..

    Select distinct
    Document_Archive.Doc_BU,
    FileLocation,
    tblFileLocations.file_name
    from
    tblFileLocations,
    dbo.Document_Archive
    where
    tblFileLocations.Bu = Document_Archive.Doc_BU and
    rtrim(tblFileLocations.file_name) = rtrim(Document_Archive.doc_parsed_filename) and
    Doc_TP = Parm and Doc_Date = Parm and Doc_Type = Parm

Viewing 15 posts - 61 through 75 (of 88 total)

You must be logged in to reply to this topic. Login to reply