Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

To T-SQL or not to T-SQL? Expand / Collapse
Author
Message
Posted Saturday, February 15, 2014 6:56 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 11:50 AM
Points: 431, Visits: 1,742
Hi,

I have about 300mb of text files. Each one is named like XXXXXX_XXX.txt, where the 6 character expression is a location code, and the three character expression is the report code.

The hitch is that I have to append all of the like reports together for loading, but tack on the location code as a new column. I have written PowerShell to do it, but on the larger files it runs like a dead pig. A long dead pig. In fact, I think even its grand-pigs are dead.

Has anyone ever had to do something similar? Any advice? The PS code is posted below using the vb mark up because all the others ones made it look weirder than it is, so if perhaps there's a better way to get where I'm going with that, I'm all ears.

Thanks

Get-ChildItem C:\Users\$env:username\Desktop\$mydir | 
Where-Object { $_.Name -match "^[^8]*_101_.*\.txt$" -AND $_.length -gt 0 } |
Foreach-Object { $dbn = $_.Name.Substring(0,6)
Get-Content -Path C:\Users\$env:username\Desktop\$mydir\$_ |
select -skip 1 | Foreach-Object { "$_`t${dbn}" |
Out-File C:\Users\$env:username\Desktop\$mydir\101Master.txt -Append} }
$f = Get-ChildItem C:\Users\$env:username\Desktop\Test\ |
Where-Object {$_.Name -match "^[^8]*_101_.*\.txt$" -AND $_.length -gt 0} | Select-Object -First 1
$b = Get-Content C:\Users\$env:username\Desktop\$mydir\$f -totalcount 1
$b = $b -replace '&','_amp'
$b = $b -replace '\$',"SS"
$b = $b -replace "[^a-zA-Z0-9`t]",""
$b = $b -replace "^\d",""
$b = $b -replace "`t\d","`t"
$b = $b -replace "\W\d+","`t"
$b = $b + "`tdbn"
Set-Content C:\Users\$env:username\Desktop\$mydir\Headers\Master101headers_cleaned.txt -value $b
$a = (Get-Content C:\Users\$env:username\Desktop\$mydir\101Master.txt) | ? {$_.trim() -ne "" }
Set-Content C:\Users\$env:username\Desktop\$mydir\Cleaned\Master101_cleaned.txt -value $b, $a

Post #1541844
Posted Saturday, February 15, 2014 7:57 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 4:33 PM
Points: 2,948, Visits: 2,965
Something similar that I once worked on, was on a migration process.
I had to change the recovery model to BULK_LOGGED and expanded the TLog file enough so that there was no addition of VLFs.
You can also disable triggers and constraints for faster load.

Regards,
Igor






Igor Micev,
SQL Server developer at Seavus
www.seavus.com
Post #1541847
Posted Saturday, February 15, 2014 8:02 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 11:50 AM
Points: 431, Visits: 1,742
Hi Igor,

Loading the files isn't the slow part; processing them for loading is. Mainly, I think, adding the location code to each file for tracking.

Thanks
Post #1541848
Posted Saturday, February 15, 2014 8:19 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 4:33 PM
Points: 2,948, Visits: 2,965
It's easy to give advises, but the hard part goes to you. You've already put some effort for a PS script to do it.
SSIS is designed for that, and I think you'll surpass it hurtless. Can you switch your approach then? because adding a column to (all in all) 300 MBs txt file is really a sick operation.

Regards,
Igor




Igor Micev,
SQL Server developer at Seavus
www.seavus.com
Post #1541850
Posted Saturday, February 15, 2014 8:43 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 11:50 AM
Points: 431, Visits: 1,742
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?!
Post #1541853
Posted Saturday, February 15, 2014 8:49 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 4:33 PM
Points: 2,948, Visits: 2,965
Try to avoid the addition of the column into the files. Whatever approach you apply for that operation, it will be slow.
Doing the same in the db engine is pretty smoother.
Idea: Can you do something like after insertion of data from a text file into a table, you do another operation which will add/update that column you're adding to the text files previously.

Regards,
Igor




Igor Micev,
SQL Server developer at Seavus
www.seavus.com
Post #1541855
Posted Saturday, February 15, 2014 8:54 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 4:33 PM
Points: 2,948, Visits: 2,965
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?!


The db engine does. I don't how it's set up there, but having high number of small increments may slow down the insertions.




Igor Micev,
SQL Server developer at Seavus
www.seavus.com
Post #1541857
Posted Saturday, February 15, 2014 8:55 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Yesterday @ 11:50 AM
Points: 431, Visits: 1,742
Yeah, I think my plan is going to be using PS to generate a bulk insert for every file, naming the tables after the files before the .txt, and immediately afterwards an alter table add column with a default of a substring of the file name. That sounds cleaner to me.

Thanks
Post #1541858
Posted Saturday, February 15, 2014 8:57 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 4:33 PM
Points: 2,948, Visits: 2,965
sqldriver (2/15/2014)
Yeah, I think my plan is going to be using PS to generate a bulk insert for every file, naming the tables after the files before the .txt, and immediately afterwards an alter table add column with a default of a substring of the file name. That sounds cleaner to me.

Thanks


I'll be gladly waiting for the improving results :)

Regards,
Igor




Igor Micev,
SQL Server developer at Seavus
www.seavus.com
Post #1541859
Posted Sunday, February 16, 2014 6:03 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 1:18 PM
Points: 1,023, Visits: 3,067
Hi

I've never compared my processes against a power shell equivalent (or anything else for that matter), but I have some large text files (around 1.5 GB) that I pre-process before I load them. I use PERL for this. It handles regular expressions and large files without any real issues. Essentially my process does something similar to yours. Read in a line (or more), run some replaces then write to a new file.
I would give you some timings for my conversions, but I don't time them individually and they are also working on zipped files adding a bit more overhead.
I can't say PERL would be quicker, but it could be worth looking at.
Post #1541949
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse