SQL CLR code needed to replace fn_regex_split

  • RBarryYoung (10/17/2009)


    Matt Miller (#4) (10/17/2009)


    Jeff Moden (10/17/2009)


    Matt Miller (#4) (10/17/2009)


    Anything involving the "outside world" IMO just incentivizes all of those cruel ways some folks love to torture SQL Server (e.g. let me use my SQL server to automate EVERYTHING).

    Heh... nah... I wanted that stuff to make life a little easier in the ETL world for me. As a side benefit, people would abuse that stuff just like they have Cursors and CLR's... makes more work for folks like me to go in an clean stuff up at a pretty good rate. 😉

    I think "benefit" is dependent on whether you get to clean it up for "free" or for a fee. For the first time in a long many years - I find myself not consulting at all, so the only messes I get stuck cleaning are the "free" ones, so they're not nearly as fun...:)

    Ouch! That's a real shame, Matt, as that has always been my favorite thing about consulting.

    I do miss it Barry. The current financial situation has dried up my last sources for it. I am unfortunately not in a position to be playing the consulting gig full-time (pre-existing conditions would be a big problem for one of my dependents), so I will look for challenges in other ways.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Paul White (10/18/2009)


    Ah yes, it was SqlBulkCopy I was thinking of: http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx

    Yep, thats what I used. I could never get it within more than 66% of BCP's speed to do the same thing. And it wan't the file IO or streaming techniques or any of that. To the best of my ability to determine, it was the data-type conversion/validation routines. The ones builtin to SQLBulkCopy, as my fastet times where when I overrode them with my own (admittedly also by restricting what numeric strings I assumed as valid).

    All very strange, and I could only conclude that what was exposed in SQLBulkCopy was not actually BCP, but something similar written in a later, possibly more capable though definitely less efficient technology. I suspect that it's the Bulk Insert API, as 1) Bulk Insert usually runs slower for me than BCP and 2) I could in fact usually match the speed of Bulk Insert with my program.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Barry, you're probably in the right area there. I agree that BCP is the fastest of the bulk tools available, at least in my experience. The core BCP engine must be pretty tightly coded and highly tuned. I can only imagine that the process of 'porting' that to the slightly 'friendlier' and fluffier BULK methods and .NET API kinda broke some of that. Some would say that reflects general Microsoft practice 😉

  • Thanks all for the lively debate, I have followed it with interest.

    Well, here is the snippet of C# code I am going with; I'm parsing a (mostly NULL) table column of a table with only 200 records; at most the string parsed is about 300 characters long:

    [SqlFunction(IsDeterministic = true, DataAccess = DataAccessKind.None)]

    public static SqlString RegExSplit_ReturnSubstring(SqlString input, SqlString pattern, SqlInt32 substringNo)

    {

    if ( input.IsNull || pattern.IsNull )

    return SqlString.Null;

    else

    return Regex.Split(input.Value, pattern.Value)[(Int32)substringNo];

    }

    Any comments are most welcome (I have omitted error-handling code and a couple of other details for clarity).

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 4 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply