Fly Girl (9/17/2012)
Windows Server 2008 R2
I'm totally new to PowerShell but was trying to use it to solve an issue with data files that come with the last row being a) short and b) containing the row count for the file. For example, a file with 77,571 rows with 34 columns per row ends with a row (row 77,572) that contains just the text 'Total Exported:77571' plus row end chars.
I can get PowerShell to give me the content of the last row using:
Get-Content filepath.txt | Select-Object -last 1
However, I haven't figured out how to write that to a new file... Oh, and I'd like to remove that row from the file when I'm done. I can use a script that will parse all files in a directory.
Anyone have time to help with this? Many Thanks!
You could try something like this:
PS C:\> $rows = Get-Content C:\data.txt | Select-Object -Last 1
PS C:\> Get-Content C:\data.txt | Select-Object -First $rows | Out-File -FilePath C:\data_lines.txt -Encoding ASCII
A couple things you should know:
- Get-Content reads the entire file, then filters the output. This means the code above reads the entire file line-by-line just to get the last row, then reads the entire file again to exclude the last row while piping it to a new file. That's three passes over the data, two reads and a write, not very efficient.
- You'll get an ASCII file out the other side of this process. Change the -Encoding option if your input file is in Unicode or some other encoding.
For a file with only 77K rows this may be fine but this technique may not perform well enough for you in case you try it on files with millions of rows. In that case you have options.
You could setup a foreach loop to read through the keeping two lines in memory at all times and only writing out the previous line if there is a newer line coming, i.e. write out the second to last line, but never the last line in the file. This would allow you to validate the file immediately after writing the new file with only the data lines, i.e. one read through the file.
For the most efficiency you would seek to the end of the file, seek in reverse byte-by-byte to find the beginning of the last line, store its value, then remove it without having to read through every byte in the entire file even once. C++ is very good at doing work like this. You can likely do the same with PowerShell, but chances are you'll need to resort to coding directly against some of the .NET objects that help us work with files at a low level. This would amount to not having to read the entire file even once, nor would you write the majority of the file to a new location, but it has the downside of being destructive on your incoming file.
__________________________________________________________________________________________________There are no special teachers of virtue, because virtue is taught by the whole community. --Plato