SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


To T-SQL or not to T-SQL?


To T-SQL or not to T-SQL?

Author
Message
sqldriver
sqldriver
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2196 Visits: 2536
Hi,

I have about 300mb of text files. Each one is named like XXXXXX_XXX.txt, where the 6 character expression is a location code, and the three character expression is the report code.

The hitch is that I have to append all of the like reports together for loading, but tack on the location code as a new column. I have written PowerShell to do it, but on the larger files it runs like a dead pig. A long dead pig. In fact, I think even its grand-pigs are dead.

Has anyone ever had to do something similar? Any advice? The PS code is posted below using the vb mark up because all the others ones made it look weirder than it is, so if perhaps there's a better way to get where I'm going with that, I'm all ears.

Thanks


Get-ChildItem C:\Users\$env:username\Desktop\$mydir |
Where-Object { $_.Name -match "^[^8]*_101_.*\.txt$" -AND $_.length -gt 0 } |
Foreach-Object { $dbn = $_.Name.Substring(0,6)
Get-Content -Path C:\Users\$env:username\Desktop\$mydir\$_ |
select -skip 1 | Foreach-Object { "$_`t${dbn}" |
Out-File C:\Users\$env:username\Desktop\$mydir\101Master.txt -Append} }
$f = Get-ChildItem C:\Users\$env:username\Desktop\Test\ |
Where-Object {$_.Name -match "^[^8]*_101_.*\.txt$" -AND $_.length -gt 0} | Select-Object -First 1
$b = Get-Content C:\Users\$env:username\Desktop\$mydir\$f -totalcount 1
$b = $b -replace '&','_amp'
$b = $b -replace '\$',"SS"
$b = $b -replace "[^a-zA-Z0-9`t]",""
$b = $b -replace "^\d",""
$b = $b -replace "`t\d","`t"
$b = $b -replace "\W\d+","`t"
$b = $b + "`tdbn"
Set-Content C:\Users\$env:username\Desktop\$mydir\Headers\Master101headers_cleaned.txt -value $b
$a = (Get-Content C:\Users\$env:username\Desktop\$mydir\101Master.txt) | ? {$_.trim() -ne "" }
Set-Content C:\Users\$env:username\Desktop\$mydir\Cleaned\Master101_cleaned.txt -value $b, $a


Igor Micev
Igor Micev
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10340 Visits: 5157
Something similar that I once worked on, was on a migration process.
I had to change the recovery model to BULK_LOGGED and expanded the TLog file enough so that there was no addition of VLFs.
You can also disable triggers and constraints for faster load.

Regards,
Igor

Igor Micev,
My blog: www.igormicev.com
sqldriver
sqldriver
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2196 Visits: 2536
Hi Igor,

Loading the files isn't the slow part; processing them for loading is. Mainly, I think, adding the location code to each file for tracking.

Thanks
Igor Micev
Igor Micev
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10340 Visits: 5157
It's easy to give advises, but the hard part goes to you. You've already put some effort for a PS script to do it.
SSIS is designed for that, and I think you'll surpass it hurtless. Can you switch your approach then? because adding a column to (all in all) 300 MBs txt file is really a sick operation.

Regards,
Igor

Igor Micev,
My blog: www.igormicev.com
sqldriver
sqldriver
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2196 Visits: 2536
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?! w00t
Igor Micev
Igor Micev
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10340 Visits: 5157
Try to avoid the addition of the column into the files. Whatever approach you apply for that operation, it will be slow.
Doing the same in the db engine is pretty smoother.
Idea: Can you do something like after insertion of data from a text file into a table, you do another operation which will add/update that column you're adding to the text files previously.

Regards,
Igor

Igor Micev,
My blog: www.igormicev.com
Igor Micev
Igor Micev
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10340 Visits: 5157
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?! w00t


The db engine does. I don't how it's set up there, but having high number of small increments may slow down the insertions.

Igor Micev,
My blog: www.igormicev.com
sqldriver
sqldriver
SSCrazy
SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)SSCrazy (2.2K reputation)

Group: General Forum Members
Points: 2196 Visits: 2536
Yeah, I think my plan is going to be using PS to generate a bulk insert for every file, naming the tables after the files before the .txt, and immediately afterwards an alter table add column with a default of a substring of the file name. That sounds cleaner to me.

Thanks
Igor Micev
Igor Micev
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10340 Visits: 5157
sqldriver (2/15/2014)
Yeah, I think my plan is going to be using PS to generate a bulk insert for every file, naming the tables after the files before the .txt, and immediately afterwards an alter table add column with a default of a substring of the file name. That sounds cleaner to me.

Thanks


I'll be gladly waiting for the improving results Smile

Regards,
Igor

Igor Micev,
My blog: www.igormicev.com
mickyT
mickyT
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2842 Visits: 3318
Hi

I've never compared my processes against a power shell equivalent (or anything else for that matter:-)), but I have some large text files (around 1.5 GB) that I pre-process before I load them. I use PERL for this. It handles regular expressions and large files without any real issues. Essentially my process does something similar to yours. Read in a line (or more), run some replaces then write to a new file.
I would give you some timings for my conversions, but I don't time them individually and they are also working on zipped files adding a bit more overhead.
I can't say PERL would be quicker, but it could be worth looking at.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search