Taking a single 80 character piece of Text and converting to columns in a table.

  • Andrew Goffin-302397 wrote:

    Hi,

    I have a 80 character piece of text that I wish to convert to a table structure that represents that data.

    Is there a simple way to take those 80 characters and basically insert them into the table structure without doing multiple substring commands for the individual columns.

    Thanks in advance

    Andrew

    Trying to get back on track for this thread...

    Andrew... are you getting this data from a file or is it stored in the format you described in a table somewhere?  If it's in a file and like I said previously in this thread, we can turn the import into a nasty fast, self checking load that sequesters bad lines of data in a separate file auto-magically.  Let's us know.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff

    2nd row has data types? I've never seen that before.. but you have no idea what dreams I will have tonight , followed by the nightmares of using the SSIS wizard and figuring out "first row contains data", "start at row x" and "how can I mess up everything you built" - all of these options are available in SSIS 🙂

     

    MVDBA

  • Jeff Moden wrote:

    To me, it doesn't matter what the data provider uses so long as it's correct and consistent.  Delimited data is cool, especially if they include column names on the first row (and I love it when they include the datatype on the second row) because I can auto-magically create a target staging table and, possibly, a BCP format file based solely on the content of the delimited file.  I also have a great love for fixed field formats simply because of their blinding speed when done properly.  "Done Properly", in such cases, does mean using a BCP format file instead of loading lines from the file as a blob and doing a bunch of substrings.

    The bottom line, though, is that regardless of the record layout, delimited or otherwise, the data provider actually needs to know what their doing and then do it right very consistently or you're screwed.

    Shifting gears a bit, the idea of using CSV or TSV is because people wanted to be able to do things like have variable width data like names, addresses, etc, and they wanted to be able to manually check/troubleshoot the data using (bleah!) spreadsheets.  Of course, that morphed to using all manner of different delimiters because comma and tab characters can actually be a part of the data.  All of that sucks, IMHO.  So does XML, JSON, and whatever other markup language you want to consider.

    With that in mind, I'd love to see people go back to the ancient method for the transmission of non-graphical data using ASCII control characters.  In particular, I'd like to see people go back to using ASCII character #30 as a record separator and ASCII character #31 for a unit separator, also know as a "delimiter".  There are a few other control characters that are extremely useful in controlling data transmission but, the use of at least the two I mentioned would greatly simplify life a whole lot.  I also wouldn't mind it if they went back to sending integer data as Big Endian binary data and decimal data in a binary decimal manner.

    Of course, none of that will happen because most of the world still depends on spreadsheets for I/O and, for those that don't, they still have some overwhelming personal need to be able to view and edit data in one (or Notepad).  😀

    I really hope you are not referring to packed-decimal, zoned-decimal or even densely packed decimal types that we see from the old EBCDIC IBM extracts.  These require unpacking the data which cannot be done using BCP so those fields have to be loaded as character and then unpacked in SQL Server - or you use something like SSIS with a script component/task to unpack the decimal value.

    Other than that - I absolutely agree...I use to work with the various NSF formats to send/receive data across systems.  We could send all claims data in either the HCFA or UB82/92 formatted files and read those same files extremely fast - and since they were standardized across the industry we could rely on the format changes to be few and far between...

    Now - we can get files in XML, CSV, TSV, ragged-right, pipe-delimited or some custom header/detail (fixed width header - delimited detail) or even a delimited header/detail where the header is indicated and the first 10-15 fields have values and the detail record the first 10-15 fields are blank but the other 30 fields are the detail (and may or may not be populated)...

    It use to be easy to add a new provider...because all of the code was already built we just had to define the vendor specific data elements and we could read the files.  Now we have to build out a completely separate process for every provider...every time.

    What ever happened to the 'supposed' standardization called EDI?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Personally, I'm REALLY happy that the whole world hasn't adopted EDI.  IMHO, it's another hierarchical nightmare of data that's even worse than things like "packed decimal".

    This is what I'm talking about though... people have a perception of what is good based on what they can see rather than what the computer can do very well.  Done correctly, integers that are (for example) converted to BINARY(4) are really easy to unpack back to integers with or without BCP/BULK INSERT and you don't have to use the SQL Native format to do it (although that format is wicked handy and fast because it's super similar to what the OP posted but even better).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Personally, I'm REALLY happy that the whole world hasn't adopted EDI.  IMHO, it's another hierarchical nightmare of data that's even worse than things like "packed decimal".

    This is what I'm talking about though... people have a perception of what is good based on what they can see rather than what the computer can do very well.  Done correctly, integers that are (for example) converted to BINARY(4) are really easy to unpack back to integers with or without BCP/BULK INSERT and you don't have to use the SQL Native format to do it (although that format is wicked handy and fast because it's super similar to what the OP posted but even better).

    I wasn't actually recommending EDI - and I am very glad it didn't go anywhere...but the point was that EDI was supposed to be a 'standard' format for sending and receiving data that everyone would utilize across all industries.  It is a nightmare bastardization of XML...but it was supposed to be a standard.

    Prior to the whole EDI/XML movement - we had standardized formats that worked very well (at least in Healthcare).  Now - we have different formats from every vendor and customized code required to send/receive data for each one.  Sometimes - we even have customized formats for the same vendor for different internal organizations.

    Instead of having a simplified library of code to import data, we now have to build a customized process...and god forbid someone in the organization decides they want to add additional data elements.  We are never included in that discussion and we only find out about the change when the import process fails - or when the end user complains because the data import process shifted the data when the vendor added the new field in the middle of the file.

    I think we went backwards...

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    I think we went backwards...

    I especially agree with that sentiment.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Jonathan AC Roberts wrote:

    MVDBA (Mike Vessey) wrote:

    Jeff Moden wrote:

    MVDBA (Mike Vessey) wrote:

    my advice - fix the source file, put a delimiter in there and then have whoever designed that file format go and work for someone else.

    In my experience it's a lot easier if you get good data in  (The GIGO principal). plus it takes the liability for a failed import off your shoulders.

    Nah... don't change a thing.  The source is actually in a perfect format that requires no delimiters.  No wasted space, no wasted time figuring out where delimiters are.  Using a BCP format file and BULK INSERT would make this import/parse absolutely fly.  That's if the source is actually a file.

    jeff, I think this is the first time I've disagreed with you 🙂 lol

    if some plonker decides to change a field length or a datetime to just a date then the string length changes and you have to recode all of the substrings from scratch.. delimeters all the way - just not CSV , I use ~

    It looks like the format of files Cobol generates, fixed width exports were/are the standard export format from mainframes.

    Heh... careful now... you'll piss off Lynn Pettis!

    Truth be told, it IS an ancient form but, oh my, is it ever useful and fast.  In fact, it used to be the way DB-II and DB-III structured it's tables.  One file per table or index structured in a very similar manner.  Variable width stuff was treated as "out of row", which was great and it all led to much smaller backups (don't back something up unless it actually changed) and made it possible to easily restore a single table as a copy of a table that got messed up.

    {EDIT}  I should have scrolled down.  I see Lynn Pettis picked up on the Cobol reference, as expected. 😀

     

    ROTFLMAO!!

  • Nothing wrong with COBOL!!!! and still use it on my shop.

    As for fixed width files - in many cases it is better than delimited mainly if data can contain lf/cr and/or the delimiter itself - and can, in some cases, be smaller than what would have been if it has been "delimited"

    As for 2nd row with data types - FIS Prophet files ".pro" can contain such a line - and its a bit harder to process if generated by their software as it will contain "bad" data at the end of the file.

    but typical file would be (line numbers not on file)

    1. Header description - what the file is and when it was created - free text
    2.  row count
    3.  model type
    4. Column headers
    5. Data type (without sizes)
    6. onwards - data - space delimited
    7. at the end.. optional CTRL-Z and some proprietary data after it.
  • frederico_fonseca wrote:

    Nothing wrong with COBOL!!!! and still use it on my shop.

    I had to convert a Cobol system (well part of a Cobol system, as I didn't finish it) into .NET. I've got to say it is the worst language imaginable. If ever I see a project that needs any contact with Cobol I will run a mile. The programs I was looking at were on average about 6000 lines long. The top half of the code (over 2000 lines) is the declarations of variables. The remaining 4000 lines were a clunky type of procedure (almost controlled by gotos). The worst thing is that there are no local variables, everything in the code is a global variable, you can't tell which bit of code has set the value in a variable as it could be anywhere. Cobol is an absolute nightmare! I never ever want to see any Cobol again!

    The only positive thing I could say about Cobol is the way you can redefine variables. So, for example, you could have an 80 character line, you could then redefine this to be multiple variables the length of which adds up to 80 characters, then by setting the variables you are also setting the value of the 80 character line. Apart from that there is nothing good about Cobol.

  • Jonathan AC Roberts wrote:

    frederico_fonseca wrote:

    Nothing wrong with COBOL!!!! and still use it on my shop.

    I had to convert a Cobol system (well part of a Cobol system, as I didn't finish it) into .NET. I've got to say it is the worst language imaginable. If ever I see a project that needs any contact with Cobol I will run a mile. The programs I was looking at were on average about 6000 lines long. The top half of the code (over 2000 lines) is the declarations of variables. The remaining 4000 lines were a clunky type of procedure (almost controlled by gotos). The worst thing is that there are no local variables, everything in the code is a global variable, you can't tell which bit of code has set the value in a variable as it could be anywhere. Cobol is an absolute nightmare! I never ever want to see any Cobol again!

    The only positive thing I could say about Cobol is the way you can redefine variables. So, for example, you could have an 80 character line, you could then redefine this to be multiple variables the length of which adds up to 80 characters, then by setting the variables you are also setting the value of the 80 character line. Apart from that there is nothing good about Cobol.

    You can have bad COBOL and you can have good COBOL.  COBOL is a tool, it isn't bad or good in and of itself.

     

  • Lynn Pettis wrote:

    Jonathan AC Roberts wrote:

    frederico_fonseca wrote:

    Nothing wrong with COBOL!!!! and still use it on my shop.

    I had to convert a Cobol system (well part of a Cobol system, as I didn't finish it) into .NET. I've got to say it is the worst language imaginable. If ever I see a project that needs any contact with Cobol I will run a mile. The programs I was looking at were on average about 6000 lines long. The top half of the code (over 2000 lines) is the declarations of variables. The remaining 4000 lines were a clunky type of procedure (almost controlled by gotos). The worst thing is that there are no local variables, everything in the code is a global variable, you can't tell which bit of code has set the value in a variable as it could be anywhere. Cobol is an absolute nightmare! I never ever want to see any Cobol again!

    The only positive thing I could say about Cobol is the way you can redefine variables. So, for example, you could have an 80 character line, you could then redefine this to be multiple variables the length of which adds up to 80 characters, then by setting the variables you are also setting the value of the 80 character line. Apart from that there is nothing good about Cobol.

    You can have bad COBOL and you can have good COBOL.  COBOL is a tool, it isn't bad or good in and of itself. 

    The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense.

    Edsger W. Dijkstra (1975)

  • Jonathan AC Roberts wrote:

    Lynn Pettis wrote:

    Jonathan AC Roberts wrote:

    frederico_fonseca wrote:

    Nothing wrong with COBOL!!!! and still use it on my shop.

    I had to convert a Cobol system (well part of a Cobol system, as I didn't finish it) into .NET. I've got to say it is the worst language imaginable. If ever I see a project that needs any contact with Cobol I will run a mile. The programs I was looking at were on average about 6000 lines long. The top half of the code (over 2000 lines) is the declarations of variables. The remaining 4000 lines were a clunky type of procedure (almost controlled by gotos). The worst thing is that there are no local variables, everything in the code is a global variable, you can't tell which bit of code has set the value in a variable as it could be anywhere. Cobol is an absolute nightmare! I never ever want to see any Cobol again!

    The only positive thing I could say about Cobol is the way you can redefine variables. So, for example, you could have an 80 character line, you could then redefine this to be multiple variables the length of which adds up to 80 characters, then by setting the variables you are also setting the value of the 80 character line. Apart from that there is nothing good about Cobol.

    You can have bad COBOL and you can have good COBOL.  COBOL is a tool, it isn't bad or good in and of itself. 

    The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense.

    Edsger W. Dijkstra (1975)

     

    Sounds like a religious war against COBOL.  Sorry I have to disagree with Dijkstra's opinion because if you think about it, it is true of every language if it is taught incorrectly.

    It is possible to write elegant code in COBOL just as easily as it is possible to write garbage code in Java, C, C++, C#, Rust, Ruby, and so on.

     

  • BHWAAA-HAAAAAAA!!!! And... it doesn't use  curly braces. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @andrew Goffin-302397 ,

    I realize the your post has been hijacked, so I'm still trying to get it back on track but I need that bit of info I've been asking about to do so.

    Is the source of your data ultimately coming from a file?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Andrew Goffin-302397 wrote:

    Yes, I totally agree unfortunately that is not possible to have the format changed ...

    Sometimes we have to work with what we are given ....

    🙂

    Actually, you can cheat a little. 🙂

    Nothing says you can't write a little program in a .NET language to translate the fixed format into a more SQL friendly intermediate file format, and then import the intermediate file instead of the original. You can even automate it: feed the original file into a file folder, have a scheduler run the intermediate file translator to create the intermediate file(s) in a different folder, and then have a scheduleded T/SQL task that imports the intermediate files.

    My company takes orders in whatever format the customer wants to give it to us in and we then translate it to an intermediate format, kind of like how .NET translates source code into its own intermediate format.

    Works VERY well, especially since from a database POV you only need staging files for your own format!

Viewing 15 posts - 31 through 45 (of 73 total)

You must be logged in to reply to this topic. Login to reply