An alternative to split for performance benefits?

  • Guys,

    I'm wondering if you can help me think of a different idea/way of tacking something which will produce faster results...

    Essentially someone will be importing a CSV file of email addresses, we're then running this collection of email addresses against a prodceure using the split function to supply it with the email addresses.

    With 500 or so email adddresses it's slugish but okay, I've been told it could be up to 30,000 addresses!

    I tried running the split function into a temp table and running against that but it seemed slower, I don't know if there's a totally different approach I can take to make things run along a bit quicker.

    Any ideas?!

  • What split function are you referring to? Is it this - http://www.sqlservercentral.com/articles/Tally+Table/72993/ ?

  • What format is the file in? Is it just one continuous string of email addresses? (i assume it is as your asking about string splitters)

  • I'd have to review the split function you linked to - I opened it and had a quick look but I've not had the time to read it yet.

    As for the format, these addresses will be in an xls and the user will create a CSV to upload, they don't have to create a CSV though of course, it's just how we've done things before...

  • Rob-350472 (12/6/2012)


    I'd have to review the split function you linked to - I opened it and had a quick look but I've not had the time to read it yet.

    As for the format, these addresses will be in an xls and the user will create a CSV to upload, they don't have to create a CSV though of course, it's just how we've done things before...

    The article is a very good explination of Jeff Moden's Delimited split 8k. If you can post a couple of lines from the file (data obfucated of course) im sure us here on the forum can come up with something close to help you out.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Rob-350472 (12/6/2012)


    I'd have to review the split function you linked to - I opened it and had a quick look but I've not had the time to read it yet.

    As for the format, these addresses will be in an xls and the user will create a CSV to upload, they don't have to create a CSV though of course, it's just how we've done things before...

    It might be that you don't really need a splitter. Depending on the format of each line in the file, the use of BULK INSERT might be a lot more appropriate and MUCH faster.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm actually looking at the bulk insert technique now - in terms of data it's very simple, an Excel file with an email address per row, saved as a CSV, that's literally all.

    The bulk insert I'm trying keeps appending them all into one column rather than multiple rows though which is a bit annoying!

    IF object_id('tempdb..#EmailAddresses') IS NOT NULL

    BEGIN

    DROP TABLE #EmailAddresses

    END

    CREATE TABLE #EmailAddresses(

    Email Varchar(max) NOT NULL

    ) ON [PRIMARY]

    BULK INSERT #EmailAddresses

    FROM 'C:\Program Files\Microsoft SQL Server\Book1.csv'

    WITH

    (

    FIELDTERMINATOR = ',',

    ROWTERMINATOR = ''

    )

    And that just spits out email addresses with a space between e.g. x@x.com y@y.com....

    I tried with 0x0a as the rowterminator instead but it generated the same results, I need to play some more with this (I've only used it once before a v long time ago!)

  • Rob-350472 (12/7/2012)


    I'm actually looking at the bulk insert technique now - in terms of data it's very simple, an Excel file with an email address per row, saved as a CSV, that's literally all.

    The bulk insert I'm trying keeps appending them all into one column rather than multiple rows though which is a bit annoying!

    IF object_id('tempdb..#EmailAddresses') IS NOT NULL

    BEGIN

    DROP TABLE #EmailAddresses

    END

    CREATE TABLE #EmailAddresses(

    Email Varchar(max) NOT NULL

    ) ON [PRIMARY]

    BULK INSERT #EmailAddresses

    FROM 'C:\Program Files\Microsoft SQL Server\Book1.csv'

    WITH

    (

    FIELDTERMINATOR = ',',

    ROWTERMINATOR = ''

    )

    And that just spits out email addresses with a space between e.g. x@x.com y@y.com....

    I tried with 0x0a as the rowterminator instead but it generated the same results, I need to play some more with this (I've only used it once before a v long time ago!)

    How many email addresses are there on each line of the CSV file? Also, have yhou looked at the file to make sure it has commas in it?

    Heh... Duh! That was in the very first paragraph of that post. Sorry.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Rob-350472 (12/7/2012)


    I tried with 0x0a as the rowterminator instead but it generated the same results, I need to play some more with this (I've only used it once before a v long time ago!)

    Just take out all mention of ROWTERMINATOR in your BULK INSERT command and see what happens that way. It the answer is "same thing" or worse, then you need to look at the hex values for the row terminators in the file itself to determine what the proper terminator should be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/7/2012)


    Rob-350472 (12/7/2012)


    I tried with 0x0a as the rowterminator instead but it generated the same results, I need to play some more with this (I've only used it once before a v long time ago!)

    Just take out all mention of ROWTERMINATOR in your BULK INSERT command and see what happens that way. It the answer is "same thing" or worse, then you need to look at the hex values for the row terminators in the file itself to determine what the proper terminator should be.

    The other typical row terminators are '\r' (carriage return/char(13)), '\~n' (new line/char(10)), '\r'~n' (CR+LF) [where backslash~n is just backslash n: backslash n by itself doesn't display at all].

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply