Need to use a script data source to pull in 10,000+ characters of data in one Excel column

  • Jeff...

    I did eventually find VB code via Google search that identified the method needed to add to a BLOB data type within the OutputBuffer object.   So far, it has been working just fine.   It seems likely that we won't be hitting 8,000 characters in an inbound file's single line of data, but that could change at any time, so I'm retaining the VARCHAR(MAX).   Also fyi. your DelimitedSplit8K function using the LEAD() improvement is a primary feature of my overall design, and I just had to make a version to handle VARCHAR(MAX) as the input string.   I've been running this function on inbound data now for more than a year and it's never been a performance issue at all.  It takes SSIS more time to load the VB-based script component and execute that than it does to run the 1st sproc that uses that split function multiple times to verify each row of data against a column configuration table that specifies column length, valid values if any, valid date formats if any, and possible table data verification or LIKE string verification.    It then updates each row's IsValid and IsValidated BIT columns with the appropriate status.

    Here's the line of VB that does the BLOB add:

    OutputBuffer.RowData.AddBlobData(System.Text.Encoding.ASCII.GetBytes(RowData))

     

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • That's some pretty awesome feedback, Steve.  Glad all of it worked out and thanks for posting what you did.  Glad the splitter and the modifications you made worked as well as you say it did.  As someone else once told me after they converted it to VARCHAR(MAX), "3 times slower than greased lightning is still pretty fast".  I was humbled by that and your feedback.  Thank you, good Sir!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    That's some pretty awesome feedback, Steve.  Glad all of it worked out and thanks for posting what you did.  Glad the splitter and the modifications you made worked as well as you say it did.  As someone else once told me after they converted it to VARCHAR(MAX), "3 times slower than greased lightning is still pretty fast".  I was humbled by that and your feedback.  Thank you, good Sir!

    You're more than welcome!   There is probably some benefit to the fact that all of my data so far has been less than 8,000 characters, so that might be why it might well operate better than "3 times slower than greased lightning"...  Anyway, keep on calling out the RBAR and poor performance situations and identify them for what they are...  it's always great to learn that something YOU recommended back in 2008 is just as relevant now as it was then, and just as useful.   Heck, even STRING_SPLIT() doesn't do what your function does, and while I'm heard somewhere that SQL 2022 may remedy that "missing functionality" in STRING_SPLIT in terms of providing the ordinal position within the string, I've not verified for myself as I just don't have the time for that kind of research.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I can't wait for the 2022 Developer's Edition to come out so I can test.  I normally don't bother with public test versions and won't with 2022 either.  That and test the new GENERATE_SERIES() and a bunch of other goodies.

    Speaking of RBAR, have you seen the article on the FORMAT function that came out recently?  You can think of it as a form of "Hidden RBAR".

    https://www.sqlservercentral.com/articles/how-to-format-dates-in-sql-server-hint-dont-use-format

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    I can't wait for the 2022 Developer's Edition to come out so I can test.  I normally don't bother with public test versions and won't with 2022 either.  That and test the new GENERATE_SERIES() and a bunch of other goodies.

    Speaking of RBAR, have you seen the article on the FORMAT function that came out recently?  You can think of it as a form of "Hidden RBAR".

    https://www.sqlservercentral.com/articles/how-to-format-dates-in-sql-server-hint-dont-use-format

    I'm kind of looking forward to seeing how STRING_SPLIT performs in SQL 2022, as well as seeing if it does indeed have the ordinal functionality.   I also found an article on GENERATE_SERIES() interesting... it could have some good uses in my current role.   I've seen the article on FORMAT...    I actually have it in use a fair chunk because I know for certain that it applies only to very low row count scenarios (less than 500).   I usually use CONVERT, but when you need a quick alternative, it's rather handy, despite its resource usage waste.   Since I saw that article, I've been sure to make NO use of it for anything with any volume, and I try to avoid it elsewhere as well.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply