How comma separated values are divided into rows?

  • Mea culpa, I overlooked one line in the first response. I'll just go shoot myself now.

    Now you're free to criticize Jeff Moden (for whom I have a tremendous amount of respect) for saying he wouldn't do this in SSIS at all.

  • Scott Coleman (2/6/2014)


    Mea culpa, I overlooked one line in the first response. I'll just go shoot myself now.

    One of the most important lines in the whole thread!

    Scott Coleman (2/6/2014)


    Now you're free to criticize Jeff Moden (for whom I have a tremendous amount of respect) for saying he wouldn't do this in SSIS at all.

    That would be like criticizing Picasso for not wanting to using crayons instead of a paint brush.

    Regarding your comment that a TSQL solution might not fit every source:

    that is certainly true, but you can always dump the data into a staging table and then use TSQL.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Yes, you could import the data to a staging table. But if neither the source nor destination is SQL, the overhead of creating the table and cleaning up the table and possibly two extra hits to the network bandwidth to copy the data may not be justified for something even a lowly laptop could easily handle by itself.

    Let me try a counter example. A co-worker once showed me an SSIS package he was very proud of. It had a huge data flow with conditional split and derived column transformations. It was a work of art, but it could also have been done in the source SQL statement with some CASE statements and subqueries, and probably with less development time. The whole data flow could be reduced to an INSERT/SELECT statement in a SQL task, without dragging the data over the network twice so it can be handled one row at a time in an SSIS data flow. It was a valid solution, and might be good for some kinds of non-relational data sources, but in this case it was not the best solution.

    My whole point was that often the questions of data source, destination, and possibly volume should take precedence over what language the solution is implemented in. For all we know the data is coming from a web service and is to be reformatted in XML and sent to an FTP site. I would certainly go with the T-SQL string splitter if the data source is SQL Server, but I would want to get that pinned down before I started debating which T-SQL string splitter implementation to use.

  • Scott Coleman (2/6/2014)


    Since "I have the input:" doesn't specify where it's coming from, your T-SQL solutions seem to assume that the data is coming from a SQL database. If not, there are other required steps to insert it into a database, transform it, and clean up afterward.

    If this data is coming from outside SQL, say a text file, and your goal is to transform it in SSIS before further processing and/or database insertion, a script transform component might be the best answer. Parse the first two fixed columns of each input row, then use String.Split on the city list and an AddRow method in a loop over the results.

    I like T-SQL as much as the next guy, but when someone is talking about transforming data in SSIS I don't assume T-SQL is the only answer.

    PS I'm the guy with a couple of SSIS packages that have evolved into one script task labeled "Do everything". I think they're perfectly maintainable, but nobody else around here has actually tried it.

    I like SSIS as much as the next guy, but when someone is talking about transforming data in any environment, I don't assume that SSIS is the only answer. 😛

    Don't be offended. I just had to have some fun with that.

    You're correct that we don't know what the source of the data is. I'm not sure that makes a huge difference in this case, though because, whether it comes from a file or a table, the answer will be nearly identical whether it be in SSIS or T-SQL. That may be a part of why the answers were more limited than they normally are for other problems.

    Shifting gears to the idea that you came up with using "Split.String" and a loop in SSIS, I couldn't have come up with that partiicular answer because I really don't use SSIS for what seem to be trivial imports of such data and so I didn't even know that "Split.String" existed. That's good to know and thank you for posting it.

    Now you're free to criticize Jeff Moden (for whom I have a tremendous amount of respect) for saying he wouldn't do this in SSIS at all.

    Heh... I'm used to, welcome, and embrace criticism. It's one of the tools I use to learn new things. For example, if someone could prove to me that using SSIS is twice as fast or twice as easy over T-SQL related methods to import text files, I'd take that into serious consideration and would fire up SSIS to start teaching myself more about it. Watching people work with it in previous jobs leads me to believe that isn't the case though and so, much like people that say T-SQL isn't the only answer, I suggest that SSIS isn't the only answer, either. 😉

    That being said, I'd love to see the actual solution in SSIS for this problem if you'd like to post it. I'd also be interested in knowing how long it took from the git to actually have done a test run that works.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (2/6/2014)

    Heh... I'm used to, welcome, and embrace criticism. It's one of the tools I use to learn new things. For example, if someone could prove to me that using SSIS is twice as fast or twice as easy over T-SQL related methods to import text files, I'd take that into serious consideration and would fire up SSIS to start teaching myself more about it. Watching people work with it in previous jobs leads me to believe that isn't the case though and so, much like people that say T-SQL isn't the only answer, I suggest that SSIS isn't the only answer, either. 😉

    Challenge accepted. 😎

    There are places where T-SQL definately outperforms SSIS. Anytime you have to coagulate the stream (I use that term loosely, but it's appropropriate) for aggregations or sorts, it dies a horrible death.

    There are transformations (particularly cascading updates and string manipulations) where SSIS rules.

    Jeff, do you have SSIS 2k8 available to you or do you need said examples in 2k5?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (2/6/2014)


    Jeff, do you have SSIS 2k8 available to you or do you need said examples in 2k5?

    That's just mean 😀

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (2/6/2014)


    Evil Kraig F (2/6/2014)


    Jeff, do you have SSIS 2k8 available to you or do you need said examples in 2k5?

    That's just mean 😀

    :Whistling:


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Jeff Moden (2/6/2014)


    Shifting gears to the idea that you came up with using "Split.String" and a loop in SSIS, I couldn't have come up with that partiicular answer because I really don't use SSIS for what seem to be trivial imports of such data and so I didn't even know that "Split.String" existed. That's good to know and thank you for posting it.

    The Split.String is actually just .NET code embedded in a script component.

    Once things get too hard for SSIS, you can basically do two things:

    * do it in TSQL

    * do it in .NET

    A lot of the time the choice between them depends on your coding abilities.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Evil Kraig F (2/6/2014)


    Jeff Moden (2/6/2014)

    Heh... I'm used to, welcome, and embrace criticism. It's one of the tools I use to learn new things. For example, if someone could prove to me that using SSIS is twice as fast or twice as easy over T-SQL related methods to import text files, I'd take that into serious consideration and would fire up SSIS to start teaching myself more about it. Watching people work with it in previous jobs leads me to believe that isn't the case though and so, much like people that say T-SQL isn't the only answer, I suggest that SSIS isn't the only answer, either. 😉

    Challenge accepted. 😎

    There are places where T-SQL definately outperforms SSIS. Anytime you have to coagulate the stream (I use that term loosely, but it's appropropriate) for aggregations or sorts, it dies a horrible death.

    There are transformations (particularly cascading updates and string manipulations) where SSIS rules.

    Jeff, do you have SSIS 2k8 available to you or do you need said examples in 2k5?

    2K8 is good. I might even install 2K12 in the next week or so. It's been sitting on my desk for a couple of months now.

    I have a better idea, though. Since you have SSIS installed already, you can do the testing on your box since (right now anyway) I consider loading SSIS onto my laptop similar to putting a cursor into production. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Koen Verbeeck (2/6/2014)


    Jeff Moden (2/6/2014)


    Shifting gears to the idea that you came up with using "Split.String" and a loop in SSIS, I couldn't have come up with that partiicular answer because I really don't use SSIS for what seem to be trivial imports of such data and so I didn't even know that "Split.String" existed. That's good to know and thank you for posting it.

    The Split.String is actually just .NET code embedded in a script component.

    Once things get too hard for SSIS, you can basically do two things:

    * do it in TSQL

    * do it in .NET

    A lot of the time the choice between them depends on your coding abilities.

    Ahhh... got it. I thought the name looked familiar. Thanks, Koen.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Luis Cazares (2/6/2014)


    Evil Kraig F (2/6/2014)


    Jeff, do you have SSIS 2k8 available to you or do you need said examples in 2k5?

    That's just mean 😀

    Nah... Craig knows me. I usually stay a rev behind because that's where a lot of people need the most help. Heh... It took me until the year 2008 to finally load SQL Server 2005. IIRC, I didn't load SQL Server 2008 (never did load R2) until 2011 (IIRC). I used to be right on top of all releases of code that I used but I'm getting too old to live on the bleeding edge. Now I let the youngsters shake out the rug before I bring it into my house. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It's an interesting discussion this. I think most of the time we gravitate to what we are most comfortable/familiar with to solve a problem, rather than choosing the 'best' option (best in this context meaning fastest, using fewest resources). Often the pressure of needing to find a solution quickly means that there isn't time to try alternatives - it it works it's good enough. How many times do 'tactical' solutions end being 'strategic'?!

    That's why I love the forums; there are usually mutiple replies to a post advocating different ways to resolve a problem. Now, if only I had the time to try them.......;-)

    Regards

    Lempster

  • Well I second you on performance

    I guess because of QotD. I tried my hands on many of them recently.

  • @Phil Parkin - Thank you !! for the SQL code tip.

Viewing 14 posts - 16 through 28 (of 28 total)

You must be logged in to reply to this topic. Login to reply