SQL Distinct comma delimited list

  • Hi

    I am trying to create a comma delimted list of names in a table using the below query

    DECLARE @listStr VARCHAR(MAX)

    SELECT @listStr = COALESCE(@listStr+',' ,'') + Name

    FROM Production.Product

    SELECT @listStr

    This works fine, however the list does contain duplicates

    Can anyone advise how I would make this 'distinct' so the list does not contain duplicates

    thanks

    simon

  • DECLARE @listStr VARCHAR(MAX)

    SELECT @listStr = COALESCE(@listStr+',' ,'') + Name

    FROM (SELECT DISTINCT Name FROM Production.Product) d

    FOR XML PATH is also worth a look.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Easiest way

    DECLARE @listStr VARCHAR(MAX)

    SELECT @listStr = COALESCE(@listStr+',' ,'') + Name

    FROM Production.Product

    GROUP BY Name

    SELECT @listStr

    πŸ™‚

  • ksatpute123 (7/24/2013)


    Easiest way

    DECLARE @listStr VARCHAR(MAX)

    SELECT @listStr = COALESCE(@listStr+',' ,'') + Name

    FROM Production.Product

    GROUP BY Name

    SELECT @listStr

    πŸ™‚

    Yep, does look like. However...

    Correct concatenation of string values (eg. accumulating of numeric values) using "SELECT @var = @var + Col FROM Table" costruction is not guaranteed in SQL Server!

    Most of times, simple select will work. Having GROUP BY (as well as using JOINS) increases probability that accumulated value will not be correct. Selecting from sub-query with DISTINCT could be a bit safer option.

    One of the reasons why all of the above ways may produce wrong results - query parallelising.

    Could be some more reasons there too, that why OPTION (MAXDOP 1) may still not help...

    I'm aware of only two ways of doing it safely with strings:

    1. aggregate CLR function

    2. using FOR XML PATH

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Here's another way... it's kind of a "Quirky-Coalesce":

    -- sample data

    DECLARE @Production_Product TABLE (name varchar(10));

    INSERT INTO @Production_Product VALUES ('aaa'),('bbb'),('ccc'),('ccc');

    DECLARE @listStr varchar(max)='';

    SELECT @listStr=@listStr+

    CASE

    WHEN @listStr='' THEN ''+name

    ELSE ','+name

    END

    FROM (SELECT DISTINCT name FROM @Production_Product) AS xxx

    SELECT @listStr;

    GO

    This will produce the same query plan as the query that Chris put together but I also included a CASE statement that prevents a leading comma.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • SELECT substring(list, 1, len(list) - 1)

    FROM (SELECT list =

    (SELECT DISTINCT name + ','

    FROM sys.objects

    ORDER BY name + ','

    FOR XML PATH(''), TYPE).value('.', 'nvarchar(MAX)')) AS T

    Yes, somewhat unwieldy and not fully intuitive. On the other hand, this is guaranteed to work as intended, which is not true for "SELECT @x = @x + col".

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Erland Sommarskog (7/24/2013)


    SELECT substring(list, 1, len(list) - 1)

    FROM (SELECT list =

    (SELECT DISTINCT name + ','

    FROM sys.objects

    ORDER BY name + ','

    FOR XML PATH(''), TYPE).value('.', 'nvarchar(MAX)')) AS T

    Yes, somewhat unwieldy and not fully intuitive. On the other hand, this is guaranteed to work as intended, which is not true for "SELECT @x = @x + col".

    Why is the SELECT @x=@x+ method not guaranteed to work?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (7/24/2013)


    Why is the SELECT

    @x=@x+ method not guaranteed to work?

    Why would it?

    See this KB article Pay particular attention to the first sentence under Cause.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • ;WITH SampleData (ID,Fruit)

    AS

    (

    SELECT 1,'Apple' UNION ALL

    SELECT 2,'Orange' UNION ALL

    SELECT 3,'Pear' UNION ALL

    SELECT 4,'Grape' UNION ALL

    SELECT 3,'Pear' UNION ALL

    SELECT 5,'Banana' UNION ALL

    SELECT 6,'Lime'

    )

    SELECT

    STUFF((SELECT DISTINCT ','+s.Fruit AS [text()]

    FROM

    SampleData AS s

    FOR XML PATH('')

    ), 1, 1, '' )

    Β 

  • Erland Sommarskog (7/24/2013)


    Alan.B (7/24/2013)


    Why is the SELECT

    @x=@x+ method not guaranteed to work?

    Why would it?

    See this KB article Pay particular attention to the first sentence under Cause.

    I say it would work based on the example I posted (which works). It produces the exact same plan and answer (except for the leading comma) as what Chris Posted which I believe is guaranteed to work. I need to read the article a little more (as well as this one[/url]) but I think it should work just fine.

    The article you posted seems to imply that there is a problem applying expressions to members of the ORDER BY clause; I am not using an ORDER BY since the requirement does not call for it. From the article:

    Under Symptoms (emph mine):

    You may encounter unexpected results when you apply any operators or expressions to the ORDER BY clause of aggregate concatenation queries.

    Under Cause:

    The correct behavior for an aggregate concatenation query is undefined...

    ... When an expression is applied to a member of an ORDER BY clause, that resulting column is not exposed in the SELECT list, resulting in undefined behavior.

    Under Work Around: (emph mine)

    In order to achieve the expected results from an aggregate concatenation query, apply any Transact-SQL function or expression to the columns in the SELECT list rather than in the ORDER BY clause.

    I could be wrong but I don't believe this article applies.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • The article is certainly contradictive, but I've decided to let it stop with the first sentence:

    The correct behavior for an aggregate concatenation query is undefined.

    The article then goes out of its way to present scenarios where it may work after all. You should keep in mind that the article was originally published when SQL 2000 was the most recent alternative, and there was not any alternatives.

    It's important to understand that just because something works in one specific test, that is no guarantee that it will always work, unless there is documentation to say so. And here the documentation clearly says "undefined".

    From SQL 2005 there is FOR XML PATH('') which has a well-defined behaviour. Nevermind that the syntax is clunky and non-intuitive. But this is the way to go if you need to build concatenated lists. (As long as you are not dealing with binary data.)

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • The discussion is going pretty well and the different approaches mentioned are good. One thing I would like to point out here that all the solutions revolve around the same principle and follow the logical query processing phase in sql server.

    1. FROM

    2. ON

    3. OUTER

    4. WHERE

    5. GROUP BY

    6. CUBE | ROLLUP

    7. HAVING

    8. SELECT

    9. DISTINCT

    10 ORDER BY

    11. TOP

    Pay attention, here GROUP BY comes before SELECT and DISTINCT. Of course this will apply independently to sub queries and virtual table expressions.

    Following this I don't see any reason why GROUP BY over aggregate concatenation will not work in any scenario. So far I have never encountered an example of the "undefined". If anyone can post an example precisely explaining this nature mentioned in the KB article then it would be helpful for all.

  • ksatpute123 (7/25/2013)


    Following this I don't see any reason why GROUP BY over aggregate concatenation will not work in any scenario. So far I have never encountered an example of the "undefined". If anyone can post an example precisely explaining this nature mentioned in the KB article then it would be helpful for all.

    If you never look when you cross the road "because there are never any cars in this area, and it has always worked for me", you will eventually be run over by a car.

    And that is the whole gist of it. Microsoft are not giving you any guarantees that it will work, and thus you should not use it.

    Examples? Yes, I have encountered cases where the result was only one of the rows in the result set. Repro? No, this was with older versions of SQL Server.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

  • Even the article focuses discussion around ORDER BY clause. I am not saying that if I have never encountered any issues with it then it is perfect. I want to see a example which precisely explains why it would not work so all of us will know what happens behind the scenes and thus have better understanding of how SQL server works.

  • Again:

    The correct behavior for an aggregate concatenation query is undefined.

    The vendor is not giving any guarantees, and this is the key issue.

    Up to SQL Server 6.x GROUP BY implied an ORDER BY. Then Hash Aggregates came along and broke that.

    Up to SQL 2000, you could use TOP 100 PERCENT ... ORDER BY in a view definition, and a SELECT from the view was always ordered.

    That is, what works today, may not work tomorrow, and Microsoft may just shrug their shoulders.

    On the other hand, if FOR XML PATH breaks, you can scream "bug" and they will have to fix it.

    [font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply