Concatenate Results Into A Single Field

  • Jeff - I will post my own script later. I kind of got busy yesterday and didn't get a chance to come back and finish the thought so to speak.

    I basically gussied this particular scenario up to a 1M test, playing with a few scenarios, making the "average size of the group" a variable. The break-even point was somewhere around 20 (meaning - if each person had 20 pets in this scenario); more pets, and the function beat out the running total; more owners, and the running total setup won.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (6/16/2008)


    Jeff - I will post my own script later. I kind of got busy yesterday and didn't get a chance to come back and finish the thought so to speak.

    I basically gussied this particular scenario up to a 1M test, playing with a few scenarios, making the "average size of the group" a variable. The break-even point was somewhere around 20 (meaning - if each person had 20 pets in this scenario); more pets, and the function beat out the running total; more owners, and the running total setup won.

    With a few exceptions (Steve and his ranch possibly) if an owner has 20 pets, you should just run sp_ASPCA and solve the problem

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Jeff Moden (6/15/2008)


    Then the XML solution will work.

    Just curious... why did you post on a 2k forum instead of a 2k5 forum? Folks wouldn't have wasted their time trying to come up with a 2k solution if you had...

    Ohh... got me a litle confused..... I didn't notice I'm publishing in 2k forum, just saw this now (after you paid my attension to this).

    I'm realy sorry for this ...

    thanks alot to anyone that helped!

    Adi

  • jcrawf02 (6/16/2008)


    ...you should just run sp_ASPCA and solve the problem

    Heh... I've found that it also works great on some developers 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • adi.cohen (6/16/2008)


    Ohh... got me a litle confused..... I didn't notice I'm publishing in 2k forum, just saw this now (after you paid my attension to this).

    I'm realy sorry for this ...

    thanks alot to anyone that helped!

    Adi

    Sorry my own self... I sometimes come across as being a little bit harsh on things like this... I don't really mean to. I guess you can blame it on the users I have to put up with at work 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/16/2008)


    jcrawf02 (6/16/2008)


    ...you should just run sp_ASPCA and solve the problem

    Heh... I've found that it also works great on some developers 😀

    You can try that, but the invisible fence collars are effective as well!

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (6/16/2008)


    Jeff Moden (6/16/2008)


    jcrawf02 (6/16/2008)


    ...you should just run sp_ASPCA and solve the problem

    Heh... I've found that it also works great on some developers 😀

    You can try that, but the invisible fence collars are effective as well!

    Yeah, but not nearly as much fun as euthanasia... or spaying/neutering...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/16/2008)


    Matt Miller (6/16/2008)


    Jeff Moden (6/16/2008)


    jcrawf02 (6/16/2008)


    ...you should just run sp_ASPCA and solve the problem

    Heh... I've found that it also works great on some developers 😀

    You can try that, but the invisible fence collars are effective as well!

    Yeah, but not nearly as much fun as euthanasia... or spaying/neutering...

    I don't know...those collars hooked up to a "human-sized" battery, say - a 9V or better, that could be some serious fun...:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • All right - back to reality for one post....

    This was what I was using to test the performance I mentioned earlier. Essentially - I'm using a @sample to control the average size of the groups. I now have it essentially at break-even (average grouping = 12.5). Drop the sample to 50K (average grouping = 20), and the function wins; increase to 125000 (grouping = 8) and the running total wins in reverse.

    drop table #petnames

    drop table #petnames2

    drop table persons

    drop table person_pets

    drop table #temp

    go

    drop function concatf

    go

    create function concatf(@id as int)

    RETURNS varchar(max)

    as begin

    declare @retstr as varchar(max)

    set @retstr='';

    select @retstr=@retstr+','+name

    from person_pets

    where person_id=@id;

    set @retstr=substring(@retstr,2,len(@retstr));

    return @retstr

    end

    go

    create table persons(id int identity(1,1) primary key clustered, name varchar(20))

    create table person_pets(person_ID int, name varchar(20))

    declare @sample int

    set @sample=80000;

    insert persons(name)

    select top(@sample) left(cast(newId() as varchar(100)),10)

    from sys.columns sc1, sys.columns sc2;

    insert person_pets

    select top 1000000 rand(checksum(newid()))*@sample+1,

    left(cast(newId() as varchar(100)),10)

    from sys.columns sc1, sys.columns sc2;

    --a couple of vars

    declare @g datetime

    set @g=getdate();

    declare @tmppetnames varchar(max)

    set @tmppetnames='';

    declare @dummy varchar(max)

    declare @prevperson int

    set @prevperson=0;

    --====== capture all of the persons

    Select person_Id,

    identity(int,1,1) as rid,

    [Name],

    Cast('' as varchar(max)) as pet_name

    Into #temp

    From person_pets

    create unique clustered index pktmp on #temp(person_Id,RID)

    --====== add in all of the pet names

    Update #temp

    Set @tmppetnames=pet_name = case when @prevperson=person_ID then @tmppetnames +',' else '' end + Name,

    @prevperson=person_id,

    @dummy=@tmppetnames

    From #temp with (index(pktmp), tablockx)

    select persons.*, pets

    into #petnames

    from

    persons

    left outer join

    (select person_id, max(pet_name) pets

    from #temp

    group by person_id) petlist

    on persons.id=petlist.person_id

    select 'running', datediff(ms,@g,getdate());

    go

    --concatenation method

    declare @g datetime

    set @g=getdate();

    create index j on person_pets(person_ID,name)

    select persons.*, dbo.concatf(ID) as pnames

    into #petnames2

    from persons

    select 'func', datediff(ms,@g,getdate());

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Ummmm.... sorry Matt... the function is not written correctly and it has not been called correctly. The function has two extra SET statements which slows it down and the call to the function will produce duplicates (if I'm not mistaken and I sometimes am ;))

    The function should be written as follows for the max performance...

    CREATE FUNCTION Concatf(@ID AS INT)

    RETURNS VARCHAR(8000)

    AS

    BEGIN

    DECLARE @RetStr AS VARCHAR(MAX)

    SELECT @RetStr=COALESCE(@RetStr+',','')+NAME

    FROM Person_Pets

    WHERE Person_ID = @ID;

    RETURN @RetStr

    END

    Notice that VARCHAR(8000) is probably more than enough to handle even Steve Jone's animals and it makes the function nearly twice as fast as VARCHAR(MAX). Compared to VARCHAR(8000), VARCHAR(MAX) carries a relatively terrible overhead.

    Remember the duplication that occurs... remember the following article...

    http://www.sqlservercentral.com/articles/Test+Data/61572/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You're right on at least one side. The function can be built better. I was aware that the concat wouldn't lead to that large of a string, so varchar(max) was a cost that could probably be avoided.

    That being said - the dupes you're talking about aren't happening, since the "persons" table is the "distinct" list of personID (the ones that are duplicated throughout the petnames). Meaning - I am following your advice to make sure you don't run it against the 1M table, only against the smaller, distinct list.

    So yes - you're correct - the better function makes life quite a bit harder to beat, but still - it does even out in the end. It looks to me that the break-even is now around 125000. Even after dropping the maximum down to 2000 (which was more than enough) and using your concatf function.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Outstanding... thanks for the feedback, Matt. And, I'm sorry... I didn't realize that you were already beating the distinct drum on the usage of the function...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/16/2008)


    Outstanding... thanks for the feedback, Matt. And, I'm sorry... I didn't realize that you were already beating the distinct drum on the usage of the function...

    Oh - no issue at all. If we're going to test for performance - might as well test with the best possible setup, so the function needs to be tight. So - thanks for catching me on that one.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Update.....

    I kind of came across an ugly realization. The running total scenario has potential to "blow up" your tempDB rather spectacularly, if you play on the "extreme low end" of this scenario. As the UPDATE cycles through the rows in a group, the variable is holding on to ever larger strings, and pushing a copy into EACH RECORD as it navigates through. So - with very large groups, you end up with some HUGE temp tables.

    I have something I'm ruminating, which might fix that problem, but as of now that algorithm could cause some real uglies. Amazing what a phrase like "processing duplicates" will jar out of you.

    As of now - that still leaves the concat function as the overall best scenario in the case when you "don't know", since its perf doesn't balloon out on the "side it doesn't like" in the same way.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Yup - I can fix the one issue on tempDB, but it still performs rather poorly on few, big groups. So - while it can beat the concat function under certain data distribution scenarios, I'd say the concat function is STILL the better option.

    In other words - the running total scenario runs a LOT worse than the function in its worse case, but the function doesn't perform that much worse in its worse case. So - the function is a better/safer approach, especially when you don't have a good bead on data distribution.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply