parse files in folder

  • The Move\Copy file is going to be used as research for customer related issues. I would like to wrap the script in a .cmd

    or bat file. That's where I was thinking to have the script prompt the user for the 3 search values...

    so I can use parts of the script you sent to do the following:

    There is a default format for inbound files

    Position 36 of the record contains 1st parm and has a length of 15

    Position 71 of record contains 2nd parm and in format YYMMDD

    The next record is the group envelope – this record has variable length fields and is designated by the first “GS” after position 104

    Position 2 after the “GS” contains the document type: The 3rd parm.

    THanks.

  • You don't need a cmd or bat file - that is what powershell replaces and you don't need to look to a position in the file because the record is delimited.

    Not sure why you keep looking to make it harder...but oh well, good luck.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I have to agree with Jeffrey Williams here.  The files are EDI files and the "*"s are actually delimiters.  If you use positional notation instead of the element ordinal after a split of a given line, you could actually end up reading the wrong data in the future if one of the preceding "fields" gets larger.  And, he's right... you don't need a BAT or CMD file for this... you can just execute the PowerShell file.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Let me answer your question with other question... What are you doing for logging of the file receipt and copy/move (whatever) and what are you using to schedule this with and what privs does a particular user require to not only execute the power shell but have the necessary privs to directories that they probably shouldn't have individual or group access to?

    These are all good questions - and should be reviewed for any solution regardless of the technology being utilized.

    1. Logging - for a pure OS level solution you are limited to logging that information in a log file.  However, it is easy enough to create a table and log that information from PS to that table using Invoke-SqlCmd.
    2. Normally we using Task Scheduler to schedule things, however - the OP's request is having the ability for the user to initiate the process by 'clicking' a link.  So - a shortcut on the desktop.
    3. Shares - I normally setup shared folders on a server (somewhere) or a NAS share.  Privileges are then managed in AD using a security group and granting the necessary rights to the security group.  Some users would be added to a RO (read only) group - some would be added to a RW (read write) - based on their roles.

    If this process was initiated from SQL - then it *might* make sense to load the file into SQL Server and parse it there, but that really depends on the end goal.  I wouldn't load files into SQL Server just to parse the records and then copy/move the files around in a file share.

    If the end goal is to parse that data into discrete data elements and load into other tables - then obviously it would be needed in the database.  However, I would not recommend even attempting to parse an EDI file in SQL Server - these files are way too complex and it would be less expensive in time and money to purchase an EDI translator to perform the work.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thanks for the feedack, Jeffrey...

    I totally agree that trying to fully parse EDI files in SQL Server is a bit crazy and even I probably wouldn't go there, especially since there are packages that will do such a thing for us that are usually way cheaper than either Developer time or DBA time.

    As for all the other good points you make, I agree especially since you also imply the "It Depends" caveat (although I really hate using Task Scheduler for more reasons than one, but that's a personal choice). 😀

    Understood on the file shares and privs.  For me, though, most of the processes even need for an individual to have privs of any sort... not even RO, especially after the process has been designed, tested, and deployed to the production environment.  While the OP has asked for the runability to be available through a "click" on a link, you have to ask "Why" an automatic process wouldn't fit the bill a whole lot better and ask other questions like "What happens if that person is on vacation or gets hit by a proverbial bus or or their machine has a melt-down"?

    And all of that was the purpose of my original question on this thread.  It would be the first step to building such an automated process and it would always be in the SQL Server as a stored procedure and change control and automatic backups/restores and a whole bunch more. People tend to forget those types of things for PoSh, BAT, and CMD files never mind the link on someone's desktop.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Thanks for the feedack, Jeffrey...

    I totally agree that trying to fully parse EDI files in SQL Server is a bit crazy and even I probably wouldn't go there, especially since there are packages that will do such a thing for us that are usually way cheaper than either Developer time or DBA time.

    As for all the other good points you make, I agree especially since you also imply the "It Depends" caveat (although I really hate using Task Scheduler for more reasons than one, but that's a personal choice). 😀

    Understood on the file shares and privs.  For me, though, most of the processes even need for an individual to have privs of any sort... not even RO, especially after the process has been designed, tested, and deployed to the production environment.  While the OP has asked for the runability to be available through a "click" on a link, you have to ask "Why" an automatic process wouldn't fit the bill a whole lot better and ask other questions like "What happens if that person is on vacation or gets hit by a proverbial bus or or their machine has a melt-down"?

    And all of that was the purpose of my original question on this thread.  It would be the first step to building such an automated process and it would always be in the SQL Server as a stored procedure and change control and automatic backups/restores and a whole bunch more. People tend to forget those types of things for PoSh, BAT, and CMD files never mind the link on someone's desktop.

    Absolutely agree on automation - if it can be automated it should.  Most often I find the users need a place to drop files for processing - generally because the vendor only allows a single individual access to the FTP site to pull down the files.  I always ask for an account so I can automate getting the file(s) - but that usually doesn't happen, hence the requirement to provide a location where the users can drop the files.

    As for task scheduler - I am not a fan either...it works but it could be so much better if Microsoft would just spend some time building a good interface.  Compared to SQL Server Agent (which has its own problems) it barely qualifies 😉

    PoSH, BAT, CMD, VB Script, Perl, etc. are code and should be managed as code.  But as you say - they are often overlooked as such and not included in a code repository, or even copied off to a 'safe' location for easy recovery.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • I was thinking of using bat or cmd because to use posh file the user who is requesting the lookup from script would have to how to launch the script from PS.

    I'm using fixed length fields because in conversations with EU they said partners in EDi can use different delimiters, and they

    said for the Inbound records that the requirements can use a these specs:

    I didn't want to have to determine delimiter of the record read on the fly.

    Position 36 of the record contains 1st parm and has a length of 15

    Position 71 of record contains 2nd parm and in format YYMMDD

    The next record is the group envelope – this record has variable length fields and is designated by the first “GS” after position 104

    Position 2 after the “GS” contains the document type: The 3rd parm.

  • I don't know of any EDI spec that doesn't use an"*" as a delimiter but people are known to make up stuff as they go along and so understood on the reasoning there.  If you have an actual written spec that says positions 36 and 71 will always be what they are and will have the fixed lengths that you say, then I hope they actually follow their own spec.  Personally, I'd at least put in a check to ensure they were always following that format or reject the file to prevent any "silent" errors that could make a big difference.

    On that same note, do you have a fixed length for the "GS" (3rd) parameter?  If not, how do you expect to find the "end" of the field if not with the use of a given delimiter?

    Understood on the reasoning for the BAT or CMD file.  The issue is all the other things that I mentioned.  For example, what are your plans for source control of all of this and what measures have you taken to prevent an unauthorized user from simply changing the file?

    I'll also state that I'm a serious rookie concerning PowerShell when compared the others on this thread and so I'll have to leave this in their capable hands.  Sorry I can't help there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • param (
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$hdr,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$date,
    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][string]$recordtype
    )

    # Get list of files
    $fileList = Get-ChildItem -Path c:\ftp_in | Where-Object {$_ -like "*.INT"};

    $fileList | ForEach-Object {
    $fileName = $_.FullName;
    $firstRow = (Get-Content $fileName | Select-Object -Index 0).Split("*");
    $firstRow


    if ($firstRow[0] -eq "ISA" -and $firstRow[6].Trim() -eq $hdr -and $firstRow[9].Trim() -eq $date -and $firstRow[17].Trim() -eq $recordtype)
    {
    Copy-Item -Path $fileName -Destination "C:\ftp_in\archive\$($filename)" -WhatIf;
    }

    }

    This works, but how can I add some logic for $date so it's YYMMDD validation. I would also like it to accept any or all of the params. If nothing is found issue message:

    For $hr .... and $date ... and $recordtype...   No files found in search..

    Thanks again for replies and help figure this out.

     

  • You could change the $date parameter to a datetime data type - then create a separate variable for the string format:

    [parameter(mandatory = $true, ValueFromPipelineByPropertyName = $true)][datetime]$date,

    $stringDate = $date.ToString("yyMMdd");

    It looks like you changed the fields and are now looking at only the first record in the file.  Based on the original specification - you need to look in the second record for the $recordtype - it does not exist in $firstRow[17].

    As I stated before - if your files are not using CR or LF or CR\LF as the record (segment) terminator then we have to change how we are processing the files.  To do that we need to get the ISA record (first record) which is a fixed length of 106 characters.  Then - we need to parse out from that file the defined delimiters:

    • Character #83 = Repetition Element Separator
    • Character #105 = Composite Element Separator
    • Character #106 = Segment Separator (we need this one)
    • Character #4 = Data Element Separator

    It is much easier to parse if we know beforehand that the segment separator is the default LF or CR\LF or just a CR.  Using one of these forces a new record in the file so we can pull from the file by index or number of records.

    If the sender is not using the default and instead is sending a single stream where - for example - they use ~ as the segment terminator we then need to account for that different terminator.  If we are parsing files from multiple vendors, and one uses the default - the other uses ~ and a third uses something different, the only option is to parse the ISA for those delimiters and then parse the results.

    Of course, if we have different vendors who have also decided to use different separators for repetition elements, composite elements and data elements then you *must* parse the ISA record to determine how to parse the other records.

    If the files are large files - then you would not want to use Get-Content to read in the full file and then parse.  Using Index, First, Tail of Get-Content allows us to read the file for only those records we want - but if the file does not have a known segment separator and we have to parse the first 106 characters to determine that value, then we have to use a different method - which might be something like this:

    $bytes = Get-Content $fileName -Encoding byte -TotalCount 106
    $isaRecord = [System.Text.Encoding]::Unicode.GetString($bytes)

    From that - we can then parse the ISA record to determine the delimiters - then use Get-Content with the appropriate segment separator to get the first and second records, then use the data element separator to parse each record.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • When they first sent me a record layout it was individual records or that's what they sent. Not until I opened one of the files did I notice it was a long string with delimiters and field values. Definitely looks like what you have outlined would be the correct approach to

    read files and processing, but not sure if my PS skills can get me there. I haven't dealt with EDI files before and they seem somewhat

    complicated.

    Thanks for responses

  • Can you provide some sample data?  The first 3 or 4 segments from several files - without any protected/private information?

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Character #83 = Repetition Element Separator   ===>  U

    Character #105 = Composite Element Separator      ===>   >

    Character #106 = Segment Separator (we need this one)   ====> {

    Character #4 = Data Element Separator    ==>   *

    I added the separator values from the files.. does that help

    THanks.

  • Not really - if you can provide sample data from several different files, include the first 3 or 4 segments for each file - I can put something together to parse the files and test/validate the code.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Would a record up to character 106 work?

Viewing 15 posts - 16 through 30 (of 88 total)

You must be logged in to reply to this topic. Login to reply