SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


To T-SQL or not to T-SQL?


To T-SQL or not to T-SQL?

Author
Message
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)

Group: General Forum Members
Points: 205781 Visits: 41952
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?! w00t


Heh... I'm right there with you on the subject of SSIS especially if all of these files have the same basic format. Even if they're different by report type, it's still not difficult to do with the likes of BULK INSERT written as a bit of dynamic SQL. You can use xp_Dirtree 'drive-or-unc-path',1,1 to get a list of files and store them in a table.

There's just no need for PoSH for this type of stuff and BULK INSERT is nasty fast.

How many different report codes do you have and how many files do you have for each report code? Is is safe to assume that each file that has the same report code will all be in the same format?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
joseph.caldwell
joseph.caldwell
Valued Member
Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)Valued Member (53 reputation)

Group: General Forum Members
Points: 53 Visits: 70
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?! w00t


SSIS is perfect for this using a derived column that looks at the file name. I could have set it up in about the time it took you to post.
sqldriver
sqldriver
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2098 Visits: 2536
Jeff Moden (2/16/2014)
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?! w00t


Heh... I'm right there with you on the subject of SSIS especially if all of these files have the same basic format. Even if they're different by report type, it's still not difficult to do with the likes of BULK INSERT written as a bit of dynamic SQL. You can use xp_Dirtree 'drive-or-unc-path',1,1 to get a list of files and store them in a table.

There's just no need for PoSH for this type of stuff and BULK INSERT is nasty fast.

How many different report codes do you have and how many files do you have for each report code? Is is safe to assume that each file that has the same report code will all be in the same format?


There are six report types for ~75 locations. All of the report types have the same layout, which makes this easier. The only issue is that the headers have wonky characters in them that I'm going to have to deal with at some point before loading. I think doing that bit of pre-processing with PS should be fine; it's just one line, even if it is ~75 times.

The thing that makes this situation a little weird is that the instance I'm loading them to lives on AWS Doze which is why I started down the PS road in the first place: it has a cmdlet to upload files right to S3. It seemed like an awesome idea to append all the files together with some bells and whistles to skip lines and select only one line and etc. and so forth rather than load each file to a staging table and then run a larger insert to a combined staging table for each report type.

I wrote a poem about my experience:

Here I sit, broken hearted
Ran my PowerShell script
Hours later, had to abort it
sqldriver
sqldriver
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2098 Visits: 2536
mickyT (2/16/2014)
Hi

I've never compared my processes against a power shell equivalent (or anything else for that matter:-)), but I have some large text files (around 1.5 GB) that I pre-process before I load them. I use PERL for this. It handles regular expressions and large files without any real issues. Essentially my process does something similar to yours. Read in a line (or more), run some replaces then write to a new file.
I would give you some timings for my conversions, but I don't time them individually and they are also working on zipped files adding a bit more overhead.
I can't say PERL would be quicker, but it could be worth looking at.


I prefer to look at PERL through a telescope. With my eyes closed and my fingers jammed in my ears :-D
mickyT
mickyT
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2738 Visits: 3318
sqldriver (2/17/2014)
mickyT (2/16/2014)
Hi

I've never compared my processes against a power shell equivalent (or anything else for that matter:-)), but I have some large text files (around 1.5 GB) that I pre-process before I load them. I use PERL for this. It handles regular expressions and large files without any real issues. Essentially my process does something similar to yours. Read in a line (or more), run some replaces then write to a new file.
I would give you some timings for my conversions, but I don't time them individually and they are also working on zipped files adding a bit more overhead.
I can't say PERL would be quicker, but it could be worth looking at.


I prefer to look at PERL through a telescope. With my eyes closed and my fingers jammed in my ears :-D

Fair enough :-)
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)SSC Guru (205K reputation)

Group: General Forum Members
Points: 205781 Visits: 41952
sqldriver (2/17/2014)
I think doing that bit of pre-processing with PS should be fine; it's just one line, even if it is ~75 times.



It would be interesting to see which approach was faster. Preprocess the files to a single file per report type or import each file separately into a common table.

Is the pre-processing route the way you're going to go on this?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Evil Kraig F
Evil Kraig F
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19653 Visits: 7660
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?! w00t


Heh, sorry to hear, but this is precisely why I use things like SSIS. A for Each loop over a data flow with a derived column to add the source filename in. You may not like SSIS, but to me using Powershell or another language for this is like going to Canada via Mexico.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
sqldriver
sqldriver
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2098 Visits: 2536
I appreciate that SSIS is a good tool for this situation, but I'm in kind of a weird environment where it's not accessible, on top of me not liking it, so it's not just a matter of preference.

I can use PowerShell to kick out some SQL based on file names, which is essentially what I've done:


Get-ChildItem C:\Users\$env:username\Desktop\$mydir |
Where-Object { $_.Name -match "^\w+.txt$" } |
Foreach-Object { $dbn = $_.Name.Substring(0,6)
$table = $_.Name -replace '.txt', ''
$altr = "alter table $schema.s_$table add column dbn varchar (100) default '$dbn';`n`n" |
Out-File -FilePath C:\Users\$env:username\Desktop\SQL\$schema\Alter.sql -Append -Encoding default }



So after I load all my files, I run this and it fills in the information for me. When I add it before loading, it does not populate.

Of course, now I'm running into an issue where SQL thinks some reports have more columns in them than they actually do. Ho-hum. :-P
Evil Kraig F
Evil Kraig F
SSCoach
SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)SSCoach (19K reputation)

Group: General Forum Members
Points: 19653 Visits: 7660
sqldriver (2/18/2014)
I appreciate that SSIS is a good tool for this situation, but I'm in kind of a weird environment where it's not accessible, on top of me not liking it, so it's not just a matter of preference.


Ick.


- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
sqldriver
sqldriver
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2098 Visits: 2536
Evil Kraig F (2/18/2014)
sqldriver (2/18/2014)
I appreciate that SSIS is a good tool for this situation, but I'm in kind of a weird environment where it's not accessible, on top of me not liking it, so it's not just a matter of preference.


Ick.


You don't know the half of it.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search