'Comedy Limited' with SQL Server

  • HighPlainsDBA - Monday, August 27, 2018 12:07 PM

    The comma-separated value (CSV) file format ... When done properly, it is the best, probably the only, convention for representing tabular data in a text file. It works.

    Eh? What is inherently superior about the comma as a delimiter? 

    I completely agree that MS has been dropping the ball for years on this.

    The best delimiter in my experience is the one that isn't in any of the text you are trying to delimit.

    The biggest challenge is often figuring out if a column has a numeric or text value (i.e. zip codes and NDC codes)

    412-977-3526 call/text

  • robert.sterbal 56890 - Monday, August 27, 2018 12:16 PM

    HighPlainsDBA - Monday, August 27, 2018 12:07 PM

    The comma-separated value (CSV) file format ... When done properly, it is the best, probably the only, convention for representing tabular data in a text file. It works.

    Eh? What is inherently superior about the comma as a delimiter? 

    I completely agree that MS has been dropping the ball for years on this.

    The best delimiter in my experience is the one that isn't in any of the text you are trying to delimit.

    The biggest challenge is often figuring out if a column has a numeric or text value (i.e. zip codes and NDC codes)

    I remember using * in some X12 transactions years ago. Worked pretty well. Even better is to use more than one character but don't get carried away (like using a GUID or something crazy like that). If I can be reasonably sure that it's a unique sequence, I'm fine with kicking out exceptions on the import and dealing with them on a case-by-case basis ...however, if I can scan the table for the delimiter sequence I want to use as a BEFORE I convert it then I'm even better off and shouldn't have any exceptions. 

    I just want to make sure I'm reading Monsieur Factor article correctly.

  • robert.sterbal 56890 - Monday, August 27, 2018 12:16 PM

    The best delimiter in my experience is the one that isn't in any of the text you are trying to delimit.

    That's why ASCII characters 28 thru 31 work so very well.

    http://www.asciitable.com/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You are absolutely correct. SSIS is a non starter.

    From SSMS, I should be able to import and export CSV and excel, simply and directly.

    The export is brainless for MS to do. And we need it both in SSMS and in tsql (so our powershell scripts can do it too).

    Import... if that stays in SSIS I understand (though I don't like it).

  • rafael.zimberoff - Monday, August 27, 2018 2:52 PM

    You are absolutely correct. SSIS is a non starter.

    From SSMS, I should be able to import and export CSV and excel, simply and directly.

    The export is brainless for MS to do. And we need it both in SSMS and in tsql (so our powershell scripts can do it too).

    Import... if that stays in SSIS I understand (though I don't like it).

    Funny, SSIS is my tool of choice for importing and exporting CSV or any other arbitrarily formatted text data. I've implemented many import and export projects using SSIS that were highly performing and robust.

  • jyoung 18482 - Monday, August 27, 2018 3:25 PM

    rafael.zimberoff - Monday, August 27, 2018 2:52 PM

    You are absolutely correct. SSIS is a non starter.

    From SSMS, I should be able to import and export CSV and excel, simply and directly.

    The export is brainless for MS to do. And we need it both in SSMS and in tsql (so our powershell scripts can do it too).

    Import... if that stays in SSIS I understand (though I don't like it).

    Funny, SSIS is my tool of choice for importing and exporting CSV or any other arbitrarily formatted text data. I've implemented many import and export projects using SSIS that were highly performing and robust.

    It DOES work, for sure.  I can't speak for Rafael (or anyone else, for that matter) but the reasons why it's a non-starter for me is that 1) I shouldn't have to use a separate application that's written in a different language to do such things, 2) I find it more difficult to deploy packages than procedures from Dev to Test to UAT to Prod, 3) there's a lot that it can't do and so people end up writing stored procedures for anyway, 4) it typically doesn't automap and needs manual intervention.  That kills me because we frequently end up importing from files that have dozens to hundreds of columns (yeah... I know... but not something in my control) and 5) because there is a lot it can't do, it promotes the "Tower of Babel" for code.  The example I gave previously (get 1 file ready for import) used a combination of Perl, VB, Active X, and some other crazy thing (I don't recall the name of it) because multiple people worked on it over time and they didn't know how to do it in T-SQL nor did they want to touch the existing code written in a language they didn't actually know.

     If you're good at SSIS and you get the right kind of performance out of it, then I can't fault you one bit because, according to MS, that's what it's there for.  I just wish it wasn't needed. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, August 27, 2018 4:10 PM

    jyoung 18482 - Monday, August 27, 2018 3:25 PM

    rafael.zimberoff - Monday, August 27, 2018 2:52 PM

    You are absolutely correct. SSIS is a non starter.

    From SSMS, I should be able to import and export CSV and excel, simply and directly.

    The export is brainless for MS to do. And we need it both in SSMS and in tsql (so our powershell scripts can do it too).

    Import... if that stays in SSIS I understand (though I don't like it).

    Funny, SSIS is my tool of choice for importing and exporting CSV or any other arbitrarily formatted text data. I've implemented many import and export projects using SSIS that were highly performing and robust.

    It DOES work, for sure.  I can't speak for Rafael (or anyone else, for that matter) but the reasons why it's a non-starter for me is that 1) I shouldn't have to use a separate application that's written in a different language to do such things, 2) I find it more difficult to deploy packages than procedures from Dev to Test to UAT to Prod, 3) there's a lot that it can't do and so people end up writing stored procedures for anyway, 4) it typically doesn't automap and needs manual intervention.  That kills me because we frequently end up importing from files that have dozens to hundreds of columns (yeah... I know... but not something in my control) and 5) because there is a lot it can't do, it promotes the "Tower of Babel" for code.  The example I gave previously (get 1 file ready for import) used a combination of Perl, VB, Active X, and some other crazy thing (I don't recall the name of it) because multiple people worked on it over time and they didn't know how to do it in T-SQL nor did they want to touch the existing code written in a language they didn't actually know.

     If you're good at SSIS and you get the right kind of performance out of it, then I can't fault you one bit because, according to MS, that's what it's there for.  I just wish it wasn't needed. 😀

    1) All I can say is "use the right tool for the job" 😀
    2) The deployment and repository model introduced in SQL 2012 adds considerable flexibility and features for deploying to the various environments. Versioning within the package as well as version control in the repository are huge, IMHO.
    3) I rarely have had to write a stored proc to support an SSIS package. I think the last time I did was some years ago when I had to support CDC for loading a data warehouse. Now that a standard component in SSIS.
    4) No sure what you're expecting for auto mapping, but if in a dataflow you name the output columns of a source to your expected destination, auto mapping works quite well for me.
    5) I can do pretty much any processing I need to inside of SSIS using it's built in components or a script component, and I've worked with some rather scary flat files over the years.

    Of course as the say, your mileage may vary 🙂

  • jyoung 18482 - Monday, August 27, 2018 5:24 PM

    1) All I can say is "use the right tool for the job" 😀

    Heh... precisely why I don't use SSIS. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, August 27, 2018 6:45 PM

    jyoung 18482 - Monday, August 27, 2018 5:24 PM

    1) All I can say is "use the right tool for the job" 😀

    Heh... precisely why I don't use SSIS. 😀

    That is also why I don't use SSIS. Just let me do what I need in tsql.

    On the export side, this should be a no brainer.

    On the import side, trickier, but should be doable.

  • I think that CSV is important because almost the whole industry knows how it is done and offers it as the default way of transferring tabular data economically. There is nothing inherently superior in using the comma.  In the very early days, the tab was more widely used. The great thing for me about using the comma to separate columns and the CR/LF sequence for the record delimiter is that both these ASCII characters are used in text and so you need to be able to 'Escape' them when they occur in the values of a column. Once you understand the escaping rules than anything can be imported or exported.

    Best wishes,
    Phil Factor

  • rafael.zimberoff - Monday, August 27, 2018 7:17 PM

    Jeff Moden - Monday, August 27, 2018 6:45 PM

    jyoung 18482 - Monday, August 27, 2018 5:24 PM

    1) All I can say is "use the right tool for the job" 😀

    Heh... precisely why I don't use SSIS. 😀

    That is also why I don't use SSIS. Just let me do what I need in tsql.

    On the export side, this should be a no brainer.

    On the import side, trickier, but should be doable.

    We've come around to a home brewed solution due to the poorer aspects of SSIS. Poor error messages for instance. Why should (for example) a package not only not manage to import a file because it is encoded wrongly, but also give completely bizarre error messages about it ('<Column name> overflow')? This was something that cost 2 hours of my time last Friday to work out. Thanks!

  • HighPlainsDBA - Monday, August 27, 2018 1:03 PM

    robert.sterbal 56890 - Monday, August 27, 2018 12:16 PM

    HighPlainsDBA - Monday, August 27, 2018 12:07 PM

    The comma-separated value (CSV) file format ... When done properly, it is the best, probably the only, convention for representing tabular data in a text file. It works.

    Eh? What is inherently superior about the comma as a delimiter? 

    I completely agree that MS has been dropping the ball for years on this.

    The best delimiter in my experience is the one that isn't in any of the text you are trying to delimit.

    The biggest challenge is often figuring out if a column has a numeric or text value (i.e. zip codes and NDC codes)

    I remember using * in some X12 transactions years ago. Worked pretty well. Even better is to use more than one character but don't get carried away (like using a GUID or something crazy like that). If I can be reasonably sure that it's a unique sequence, I'm fine with kicking out exceptions on the import and dealing with them on a case-by-case basis ...however, if I can scan the table for the delimiter sequence I want to use as a BEFORE I convert it then I'm even better off and shouldn't have any exceptions. 

    I just want to make sure I'm reading Monsieur Factor article correctly.

    Sorry for my poor grammar, English is my first language...  Here is a clearer sentence:

    If I can be reasonably sure that it's a unique sequence, I'm fine with kicking out exceptions on the import and dealing with them on a case-by-case basis since I would expect very to occur ...however, if I can, I prefer to scan the table (or whatever data source I'm using)  for the delimiter sequence I'm thinking of using BEFORE I import to ensure that I don't have any exceptions. You could write a little utility with PoSh, C, or T-SQL (extended sp's) ... whatever to do this pre-check.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply