April 22, 2011 at 2:36 pm
We are on about our 4th project using SSIS. Usually we're using it for transfers from the database to flat file.
I am getting progressively more frustrated with the many quirks of SSIS. It seems like things are often more difficult to figure out than I'd expect for a higher level tool (higher level than writing code). Something I really don't like is that reuse is very limited or at least difficult. There's no really any way that I've seen that I can pull out resuable pieces and use (better yet, share) them in other projects.
With these issues in mind, I'm starting to seriously think it would be better to write these projects in C#. It would allow shared assemblies for reuse. And I wouldn't experience the many odd quirks that take hours to track down.
Thoughts? Do some of you feel similarly, or feel that it's really useful?
April 22, 2011 at 2:45 pm
I can see what you are saying about code reuse. For many of the cases where I may need to reuse code, I try to use stored procs.
Overall, there are many things that are done better in an OO language and many other things that are done better in TSQL. But there are still quite a few good uses for SSIS.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 22, 2011 at 3:24 pm
I can identify with your frustration. I avoid SSIS unless I truly want to do Transformation work that justifies the overhead, but that bar is moving higher and higher for me. SSIS is also good for DB-to-DB ETL work.
The Load portion of SSIS with respect to delimited flat files are limited (pun intended) in my opinion so I usually opt for bcp or BULK INSERT. For working with Flat Files that SSIS simply will not parse (e.g. CSV files conforming to section 2 of RFC4180 containing embedded row and column delimiters) I use FileHelpers (www.FileHelpers.com) in a C# console app to either transform the file into something SSIS will parse, or I skip SSIS completely and just bcp the file into the DB after changing it with FileHelpers.
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 22, 2011 at 4:02 pm
I'll flip your concerns. I know a smattering of .NET (VB, C#, or otherwise). Could I probably build a file looper or some kind of streaming data lookup? Yeah, given a few months to go figure out how.
It's sitting there, ready for me to use however. I'm just a SQL Dude. I consider Hello, World! a challenge these days half the time (okay, that might be excessive, but you get the idea). The whole concept of polymorphic inheritance is just not in my vocabulary these days.
Is it a bit glitchy and kind of a pain in the arse? Yeah, some days. But it's a damned handy tool for those of us not interested in learning how to go program a roll your own.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 22, 2011 at 5:14 pm
I guess I'll have to flip them back, then. I use BCP, SQLCmd, T-SQL and my old and dearly beloved ETL friend, xp_CmdShell especially when flat files for import or export are concrned.
I do wish they'd make T-SQL as good as MySQL in this area. It would be so cool to just write ...
SELECT columnlist
FROM sometable
INTO OUTFILE fullpathname
--Jeff Moden
Change is inevitable... Change for the better is not.
April 22, 2011 at 6:38 pm
You're thinking like an administrator, not a developer, Jeff.
Most places restrict, or outright deny, xp_cmdshell any access.
Now, to save you the argument, your SQL Agent probably isn't restricted that way, but it's still frowned upon. It also can run BCP, which isn't quite as disdained. It's also a bunch of command line switches that I always have to lookup to remember how to use properly. Again, that's more practice then anything.
However, as far as I know, if you set the job up to run BCP you have no variable or configuration controls. For example, outputting to an incrementing filename, or easily controlling filepathing via environment. Also, format files are annoying at best. Learning this method: http://msdn.microsoft.com/en-us/library/ms191516.aspx is at best a lesson in aggravation. I will say though that unlike SSIS, it's easily reusable.
SQLCMD is merely commandline or batch script access to the server. Useful for adhoc reports but not as powerful. I certainly wouldn't want to do data transformations in it. To be honest, I'd never really think to use this as an ETL method. It's a very different tool.
Bulk Import is another useful item. I've used it before but again, I don't find it as powerful. Especially since it has a number of restrictions. In particular:
Each field in the data file must be compatible with the corresponding column in the target table. For example, an int field cannot be loaded into a datetime column.
and
Comma-separated value (CSV) files are not supported by SQL Server bulk-import operations. However, in some cases, a CSV file can be used as the data file for a bulk import of data into SQL Server. Note that the field terminator of a CSV file does not have to be a comma. To be usable as a data file for bulk import, a CSV file must comply with the following restrictions:
Data fields never contain the field terminator.
Either none or all of the values in a data field are enclosed in quotation marks ("").
This means you can't transform on the fly, and you need heavy controls on the varchar data.
In all of these cases these things can be worked around, so I'll try to avoid belaboring any limitations.
However, none of these tools give you an all in one ability to:
- Attach data during the reading of the file before writing (lookup component), requiring a second update once data is imported.
- Easily control environmental configurations.
- Allow for built-in file loop loading.
- Have an easy to use interface for file format setups
- Allow you to transform data mid-flight
- Run in a separate memory space then your server engine (BCP does, Bulk Import doesn't)
- Easy to follow decision trees or error handling.
There are a number of pieces of SSIS that I like, especially from a developer standpoint. The package is encapsulated, testable, and deployable through all environments and target servers without direct modification. It doesn't break command line access protocols. It has a relatively robust calling mechanic within SQLAgent.
I agree with you Jeff that the old-school ways are useful and powerful, I just don't find them as complete, or with as many available tools, as SSIS is. I fought the conversion from DTS and similar tools to SSIS for a long time. Now I can't ever picture going back.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 23, 2011 at 11:55 am
Understood. But the "old school" ways do work quite nicely including such things as you mention like incrementing file names, etc, etc. And, there's usually no need for BCP Format Files even when you need to skip something like an IDENTITY column. On the fly processing for the casual use of quoted identifiers only if there's a column delimiter in the data is no problem with OPENROWSET and the ACE drivers. BCP and BULK INSERT also have some pretty good conditional processing built in whereby bad rows can be routed to a separate file without stopping the process on the the good rows.
Then there's the problem of doing things that SSIS simply can't do. What people normally resort to is "writing a script" of some sort. There's no difference between that and a properly setup proxy to run xp_CmdShell via a scheduled job. And, make no doubt about it, when folks see what I can do for ETL outside of SSIS, they very quickly see the benefit of setting up a proxy. Further, if the ETL is "import only", I don't even need xp_CmdShell to read file names and do a whole bunch of other stuff.
A lot of the things you call "restrictions", I call "features". Having a system that will automatically reject rows based on datatype is highly beneficial for the things I typically do.
I think that people have been brainwashed by MS and others into thinking that the use of xp_CmdShell is taboo. That notwithstanding, I have no problem with people wanting to use SSIS and can even help them in the form of writing certain stored procedures to augment SSIS. It will never be my first recommendation, though. 😀
What I'd really like to see is for MS to get off their butt and make file processing via T-SQL a whole lot easier like they've done in MySQL.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 23, 2011 at 12:27 pm
Interesting thoughts.
I think SSIS is useful, but it has a high bar of effort to get things done. The "code reuse" comes from cut/pasting, from what I've seen, into a package from another. Not a lot of obvious reuse, and for some tasks, it's downright annoying to reconfigure the whole thing.
I have seen some people that use scripts to build parts of packages dynamically from a known pattern. I think that if you do a lot of SSIS work, this pays off, but it's a high initial investment to figure out and gain some skill in the building of components and tasks from code.
However I don't think that writing your own importer is necessarily better. There are lots of benefits to SSIS with debugging, storing data in memory, streaming to multiple places, parallelism, that would be hard to write. I know you could do them, but I think the framework provides a lot, assuming you buy into the way it works and change your development to fit that framework.
If you have specific problems, I'd love to see a thread on something that's really annoying or slow and see if there might be advice from SSIS experts that helps you solve the underlying issue.
April 23, 2011 at 12:29 pm
I think SSIS is "just" an alternative to some of the tools Jeff mentioned. Nothing more, but nothing less either.
There are two tasks I prefer using SSIS:
a) export to excel with dynamic file name and conditional adding new worktables if the number of rows exceed 60k
b) If I could benefit from parallel processing of different tasks (e.g. load data from various files into different tables or write data to various different files in parallel)
Most of the time I end up using SSIS to call T-SQL sprocs though...
The other "benefit" is maintainability by those folks used to drag & drop (aka "click here and there")... 😉
April 23, 2011 at 12:30 pm
Re: xp_cmdshell. It's not so much that people are brainwashed as it's an open-ended shell, which can be a security risk. You can minimize it, and perhaps mitigate any major concerns, but it's not a simple process for most people to understand the implications of allowing shell access.
As a general rule, I don't allow it, especially for non-sysadmins. Just because I don't know how someone might take advantage of it, or more likely, screw something up by not understanding what they are running in a shell.
April 23, 2011 at 1:17 pm
Jeff Moden (4/23/2011)
Understood. But the "old school" ways do work quite nicely including such things as you mention like incrementing file names, etc, etc. And, there's usually no need for BCP Format Files even when you need to skip something like an IDENTITY column.
I was thinking more along the lines of fixed width files for the format files then column skips.
On the fly processing for the casual use of quoted identifiers only if there's a column delimiter in the data is no problem with OPENROWSET and the ACE drivers. BCP and BULK INSERT also have some pretty good conditional processing built in whereby bad rows can be routed to a separate file without stopping the process on the the good rows.
Yes and no. For BCP, the -e switch does allow for an error file. Now it's been a bit since I've used BCP, mostly because of the xp_cmdshell thing with no proxy, so a question or two. Does that error file allow for customized failure reasons, such as foreignkey/lookup failure (ie dimension lookup) and data translation error (ie 1e-104 into an int), on a per failed row basis?
Then there's the problem of doing things that SSIS simply can't do. What people normally resort to is "writing a script" of some sort. There's no difference between that and a properly setup proxy to run xp_CmdShell via a scheduled job.
Except said script runs midstream, allowing for en-route changes, rather then multiple passes on the data. Things like that in SSIS allow for faster data transfers, although it is more work up front.
And, make no doubt about it, when folks see what I can do for ETL outside of SSIS, they very quickly see the benefit of setting up a proxy. Further, if the ETL is "import only", I don't even need xp_CmdShell to read file names and do a whole bunch of other stuff.
Hm, interesting. You wouldn't by chance have, or know of, an article that I could read through that illustrates this? Even when I was banging my head on DTS doing vbscript and object activation manipulation I didn't find an 'easier' way via scripting, particularly for multiple files.
A lot of the things you call "restrictions", I call "features". Having a system that will automatically reject rows based on datatype is highly beneficial for the things I typically do.
SSIS can and does as well, it's more that you can transform these midflight if necessary, which that tool can't. A common case is varchar date to a datetime field, or swapping a business entry for your local surrogate key.
I think that people have been brainwashed by MS and others into thinking that the use of xp_CmdShell is taboo. That notwithstanding, I have no problem with people wanting to use SSIS and can even help them in the form of writing certain stored procedures to augment SSIS. It will never be my first recommendation, though. 😀
It's not just that xp_cmdshell is taboo, it's that any access to the server outside of the engine needs to be finely controlled. Many administrators handle too many dev teams to review every inch of code that comes across their desk, and the implications of a poorly written, or properly restricted, xp_cmdshell usage can be staggering. Locking it down to administrative usage makes for less pain. It's less brainwashing and more self-defense.
Regarding first recommendation, we'll have to agree to disagree on that. For some tasks it's my go-to.
Edit: I should add, Jeff, I'm trying to prompt you into teacher mode a little on this. I'm not going to claim expertise on any of the older tools.. heck, on any tool. 🙂
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 23, 2011 at 1:21 pm
LutzM (4/23/2011)
Most of the time I end up using SSIS to call T-SQL sprocs though...The other "benefit" is maintainability by those folks used to drag & drop (aka "click here and there")... 😉
I will certainly use SSIS to call sprocs as well as other items, whatever gets the job done the fastest and easiest to maintain. The most powerful components in it are the dataflow transformations and error controls, and the environmental configurations.
The other item that is very useful is the GUI interface, I won't lie. Being able to SEE the flow, components, and tasks in a logical ordering and the resultant output is very handy and quite useful in troubleshooting.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
April 23, 2011 at 1:55 pm
Good topic...I have agreed and disagreed with points in everyones posts so far. I say use what's secure, what performs well enough to support the goal in mind and allow for anticipated growth and what allows you to get your job done fastest without leaving an unmaintainable mess...in that order. Choosing from a set of tools that can actually do the job in the first place goes without saying (see my earlier comment about CSV files...that really erks me about SSIS...I don't know if I will ever forgive the SSIS team for that one).
There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato
April 23, 2011 at 2:14 pm
Craig Farrell (4/23/2011)
I was thinking more along the lines of fixed width files for the format files then column skips.
Files with fixed width fields are no more difficult with BCP/BULK INSERT than they are with SSIS. You only need to define them as you would in SSIS. You have to somehow assign the starting position and the length of the fields. In SSIS, you use a GUI. With BCP/BULK INSERT, you use SUBSTRING.
On the fly processing for the casual use of quoted identifiers only if there's a column delimiter in the data is no problem with OPENROWSET and the ACE drivers. BCP and BULK INSERT also have some pretty good conditional processing built in whereby bad rows can be routed to a separate file without stopping the process on the the good rows.
Yes and no. For BCP, the -e switch does allow for an error file. Now it's been a bit since I've used BCP, mostly because of the xp_cmdshell thing with no proxy, so a question or two. Does that error file allow for customized failure reasons, such as foreignkey/lookup failure (ie dimension lookup) and data translation error (ie 1e-104 into an int), on a per failed row basis?
I don't work with SSIS so forgive my ignorance of the subject. Are you saying that SSIS does that automatically or do you have to tell it to do so? If I put foreign keys on my staging table (and I ALWAYS use a staging table), then BCP/BULK INSERT will put such rows in the errata file and a "control" file which contains the diagnostics for why each row failed. From BOL...
ERRORFILE = 'file_name'
Specifies the file used to collect rows that have formatting errors and cannot be converted to an OLEDB rowset. These rows are copied into this error file from the data file "as is."
The error file is created when the command is executed. An error occurs if the file already exists. Additionally, a control file that has the extension .ERROR.txt is created. This references each row in the error file and provides error diagnostics. As soon as the errors have been corrected, the data can be loaded.
Except said script runs midstream, allowing for en-route changes, rather then multiple passes on the data. Things like that in SSIS allow for faster data transfers, although it is more work up front.
That reminds me of some of the contests on some of the forums. Sometimes (a lot of times, actually), making more than one pass at the data is actually faster than trying to do everything to each row all at once. I'll be happy to agree that "It Depends". 😀
And, make no doubt about it, when folks see what I can do for ETL outside of SSIS, they very quickly see the benefit of setting up a proxy. Further, if the ETL is "import only", I don't even need xp_CmdShell to read file names and do a whole bunch of other stuff.
Hm, interesting. You wouldn't by chance have, or know of, an article that I could read through that illustrates this? Even when I was banging my head on DTS doing vbscript and object activation manipulation I didn't find an 'easier' way via scripting, particularly for multiple files.
I've not found a good one so I don't have a URL to post. Maybe I should write an article about it someday. 🙂
A lot of the things you call "restrictions", I call "features". Having a system that will automatically reject rows based on datatype is highly beneficial for the things I typically do.
SSIS can and does as well, it's more that you can transform these midflight if necessary, which that tool can't. A common case is varchar date to a datetime field, or swapping a business entry for your local surrogate key.
??? :blink: Gosh, Craig. What makes you think that converting a VARCHAR datetime to a DATETIME datatype is a problem for either BCP or BULK INSERT? It happens rather automatically for both. So far as swapping a business entry for a local surrogate key goes, that's easily accomplished using a calculated column.
I think that people have been brainwashed by MS and others into thinking that the use of xp_CmdShell is taboo. That notwithstanding, I have no problem with people wanting to use SSIS and can even help them in the form of writing certain stored procedures to augment SSIS. It will never be my first recommendation, though. 😀
It's not just that xp_cmdshell is taboo, it's that any access to the server outside of the engine needs to be finely controlled. Many administrators handle too many dev teams to review every inch of code that comes across their desk, and the implications of a poorly written, or properly restricted, xp_cmdshell usage can be staggering. Locking it down to administrative usage makes for less pain. It's less brainwashing and more self-defense.
Now I know how Adam Machanic feels when someone offers the same reasons for not allowing CLR's on their boxes. 🙂 The Lead on each Dev Team and the DBA's should be a closely knit team of their own and the Lead's should be doing the code reviews under the tutilage of the DBA. Further, considering the awesome and frequently unchecked power of SSIS packages, why would any DBA feel more comfortable with SSIS packages that may never reviewed by a DBA?
Regarding first recommendation, we'll have to agree to disagree on that. For some tasks it's my go-to.
I can certainly understand and live with that! It all boils down to this... the OP asked the question of "is SSIS worth it?" Your answer is a flat "Yes" and my answer is, as usual, "It Depends." 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
April 23, 2011 at 2:36 pm
Edit: I should add, Jeff, I'm trying to prompt you into teacher mode a little on this. I'm not going to claim expertise on any of the older tools.. heck, on any tool.
BWAA-HAAA!!! You snuck that edit in on me. 😀 You should know me by now... I'm always in one of two modes and usually in those two modes at the same time... Teacher and Student.
No problem with the friendly banter between thee and me. If everyone agreed on everything, none of us would learn anything new. I'm learning things on this thread and I hope other folks are, as well. I just want to make sure that some of the methods other than SSIS are covered and the reasons why I think they're sometimes better. They do come out as a bit "adversarial" at times because I suck at making friendly posts when there's a disagreement (although I do try).
And you've succeeded, my good friend. You've turned the "teacher" mode on in me... As I previously stated, I've not seen very many good articles on the subject and I'm thinking of what I should cover in such an article.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply