Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

To T-SQL or not to T-SQL? Expand / Collapse
Author
Message
Posted Sunday, February 16, 2014 6:44 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:08 AM
Points: 35,366, Visits: 31,906
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?!


Heh... I'm right there with you on the subject of SSIS especially if all of these files have the same basic format. Even if they're different by report type, it's still not difficult to do with the likes of BULK INSERT written as a bit of dynamic SQL. You can use xp_Dirtree 'drive-or-unc-path',1,1 to get a list of files and store them in a table.

There's just no need for PoSH for this type of stuff and BULK INSERT is nasty fast.

How many different report codes do you have and how many files do you have for each report code? Is is safe to assume that each file that has the same report code will all be in the same format?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1541958
Posted Monday, February 17, 2014 7:36 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Yesterday @ 7:53 AM
Points: 9, Visits: 55
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?!


SSIS is perfect for this using a derived column that looks at the file name. I could have set it up in about the time it took you to post.
Post #1542129
Posted Monday, February 17, 2014 8:22 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 8:37 AM
Points: 467, Visits: 1,886
Jeff Moden (2/16/2014)
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?!


Heh... I'm right there with you on the subject of SSIS especially if all of these files have the same basic format. Even if they're different by report type, it's still not difficult to do with the likes of BULK INSERT written as a bit of dynamic SQL. You can use xp_Dirtree 'drive-or-unc-path',1,1 to get a list of files and store them in a table.

There's just no need for PoSH for this type of stuff and BULK INSERT is nasty fast.

How many different report codes do you have and how many files do you have for each report code? Is is safe to assume that each file that has the same report code will all be in the same format?


There are six report types for ~75 locations. All of the report types have the same layout, which makes this easier. The only issue is that the headers have wonky characters in them that I'm going to have to deal with at some point before loading. I think doing that bit of pre-processing with PS should be fine; it's just one line, even if it is ~75 times.

The thing that makes this situation a little weird is that the instance I'm loading them to lives on AWS which is why I started down the PS road in the first place: it has a cmdlet to upload files right to S3. It seemed like an awesome idea to append all the files together with some bells and whistles to skip lines and select only one line and etc. and so forth rather than load each file to a staging table and then run a larger insert to a combined staging table for each report type.

I wrote a poem about my experience:

Here I sit, broken hearted
Ran my PowerShell script
Hours later, had to abort it

Post #1542151
Posted Monday, February 17, 2014 8:32 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 8:37 AM
Points: 467, Visits: 1,886
mickyT (2/16/2014)
Hi

I've never compared my processes against a power shell equivalent (or anything else for that matter), but I have some large text files (around 1.5 GB) that I pre-process before I load them. I use PERL for this. It handles regular expressions and large files without any real issues. Essentially my process does something similar to yours. Read in a line (or more), run some replaces then write to a new file.
I would give you some timings for my conversions, but I don't time them individually and they are also working on zipped files adding a bit more overhead.
I can't say PERL would be quicker, but it could be worth looking at.


I prefer to look at PERL through a telescope. With my eyes closed and my fingers jammed in my ears
Post #1542163
Posted Monday, February 17, 2014 10:48 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 7:48 PM
Points: 1,057, Visits: 3,127
sqldriver (2/17/2014)
mickyT (2/16/2014)
Hi

I've never compared my processes against a power shell equivalent (or anything else for that matter), but I have some large text files (around 1.5 GB) that I pre-process before I load them. I use PERL for this. It handles regular expressions and large files without any real issues. Essentially my process does something similar to yours. Read in a line (or more), run some replaces then write to a new file.
I would give you some timings for my conversions, but I don't time them individually and they are also working on zipped files adding a bit more overhead.
I can't say PERL would be quicker, but it could be worth looking at.


I prefer to look at PERL through a telescope. With my eyes closed and my fingers jammed in my ears

Fair enough
Post #1542228
Posted Monday, February 17, 2014 9:25 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:08 AM
Points: 35,366, Visits: 31,906
sqldriver (2/17/2014)
I think doing that bit of pre-processing with PS should be fine; it's just one line, even if it is ~75 times.



It would be interesting to see which approach was faster. Preprocess the files to a single file per report type or import each file separately into a common table.

Is the pre-processing route the way you're going to go on this?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1542351
Posted Tuesday, February 18, 2014 12:23 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 6:08 PM
Points: 5,401, Visits: 7,514
sqldriver (2/15/2014)
I really hate SSIS and was hoping to avoid it. I guess I could try doing this all in another language that handles large file operations better. Of course, since these are all going to garbage staging tables, I could always generate dynamic import statements with extra variables. Hm.

Who cares about filling up transaction logs, anyway?!


Heh, sorry to hear, but this is precisely why I use things like SSIS. A for Each loop over a data flow with a derived column to add the source filename in. You may not like SSIS, but to me using Powershell or another language for this is like going to Canada via Mexico.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1542706
Posted Tuesday, February 18, 2014 1:32 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 8:37 AM
Points: 467, Visits: 1,886
I appreciate that SSIS is a good tool for this situation, but I'm in kind of a weird environment where it's not accessible, on top of me not liking it, so it's not just a matter of preference.

I can use PowerShell to kick out some SQL based on file names, which is essentially what I've done:

Get-ChildItem C:\Users\$env:username\Desktop\$mydir | 
Where-Object { $_.Name -match "^\w+.txt$" } |
Foreach-Object { $dbn = $_.Name.Substring(0,6)
$table = $_.Name -replace '.txt', ''
$altr = "alter table $schema.s_$table add column dbn varchar (100) default '$dbn';`n`n" |
Out-File -FilePath C:\Users\$env:username\Desktop\SQL\$schema\Alter.sql -Append -Encoding default }

So after I load all my files, I run this and it fills in the information for me. When I add it before loading, it does not populate.

Of course, now I'm running into an issue where SQL thinks some reports have more columns in them than they actually do. Ho-hum.



Post #1542724
Posted Tuesday, February 18, 2014 1:39 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 6:08 PM
Points: 5,401, Visits: 7,514
sqldriver (2/18/2014)
I appreciate that SSIS is a good tool for this situation, but I'm in kind of a weird environment where it's not accessible, on top of me not liking it, so it's not just a matter of preference.


Ick.



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1542728
Posted Tuesday, February 18, 2014 1:47 PM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 8:37 AM
Points: 467, Visits: 1,886
Evil Kraig F (2/18/2014)
sqldriver (2/18/2014)
I appreciate that SSIS is a good tool for this situation, but I'm in kind of a weird environment where it's not accessible, on top of me not liking it, so it's not just a matter of preference.


Ick.


You don't know the half of it.
Post #1542733
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse