July 24, 2013 at 5:10 am
Hi
I am trying to create a comma delimted list of names in a table using the below query
DECLARE @listStr VARCHAR(MAX)
SELECT @listStr = COALESCE(@listStr+',' ,'') + Name
FROM Production.Product
SELECT @listStr
This works fine, however the list does contain duplicates
Can anyone advise how I would make this 'distinct' so the list does not contain duplicates
thanks
simon
July 24, 2013 at 5:15 am
DECLARE @listStr VARCHAR(MAX)
SELECT @listStr = COALESCE(@listStr+',' ,'') + Name
FROM (SELECT DISTINCT Name FROM Production.Product) d
FOR XML PATH is also worth a look.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 24, 2013 at 7:41 am
Easiest way
DECLARE @listStr VARCHAR(MAX)
SELECT @listStr = COALESCE(@listStr+',' ,'') + Name
FROM Production.Product
GROUP BY Name
SELECT @listStr
π
July 24, 2013 at 8:45 am
ksatpute123 (7/24/2013)
Easiest wayDECLARE @listStr VARCHAR(MAX)
SELECT @listStr = COALESCE(@listStr+',' ,'') + Name
FROM Production.Product
GROUP BY Name
SELECT @listStr
π
Yep, does look like. However...
Correct concatenation of string values (eg. accumulating of numeric values) using "SELECT @var = @var + Col FROM Table" costruction is not guaranteed in SQL Server!
Most of times, simple select will work. Having GROUP BY (as well as using JOINS) increases probability that accumulated value will not be correct. Selecting from sub-query with DISTINCT could be a bit safer option.
One of the reasons why all of the above ways may produce wrong results - query parallelising.
Could be some more reasons there too, that why OPTION (MAXDOP 1) may still not help...
I'm aware of only two ways of doing it safely with strings:
1. aggregate CLR function
2. using FOR XML PATH
July 24, 2013 at 2:46 pm
Here's another way... it's kind of a "Quirky-Coalesce":
-- sample data
DECLARE @Production_Product TABLE (name varchar(10));
INSERT INTO @Production_Product VALUES ('aaa'),('bbb'),('ccc'),('ccc');
DECLARE @listStr varchar(max)='';
SELECT @listStr=@listStr+
CASE
WHEN @listStr='' THEN ''+name
ELSE ','+name
END
FROM (SELECT DISTINCT name FROM @Production_Product) AS xxx
SELECT @listStr;
GO
This will produce the same query plan as the query that Chris put together but I also included a CASE statement that prevents a leading comma.
-- Itzik Ben-Gan 2001
July 24, 2013 at 3:37 pm
SELECT substring(list, 1, len(list) - 1)
FROM (SELECT list =
(SELECT DISTINCT name + ','
FROM sys.objects
ORDER BY name + ','
FOR XML PATH(''), TYPE).value('.', 'nvarchar(MAX)')) AS T
Yes, somewhat unwieldy and not fully intuitive. On the other hand, this is guaranteed to work as intended, which is not true for "SELECT @x = @x + col".
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
July 24, 2013 at 3:57 pm
Erland Sommarskog (7/24/2013)
SELECT substring(list, 1, len(list) - 1)
FROM (SELECT list =
(SELECT DISTINCT name + ','
FROM sys.objects
ORDER BY name + ','
FOR XML PATH(''), TYPE).value('.', 'nvarchar(MAX)')) AS T
Yes, somewhat unwieldy and not fully intuitive. On the other hand, this is guaranteed to work as intended, which is not true for "SELECT @x = @x + col".
Why is the SELECT @x=@x+ method not guaranteed to work?
-- Itzik Ben-Gan 2001
July 24, 2013 at 4:01 pm
Alan.B (7/24/2013)
Why is the SELECT@x=@x+ method not guaranteed to work?
Why would it?
See this KB article Pay particular attention to the first sentence under Cause.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
July 24, 2013 at 4:26 pm
;WITH SampleData (ID,Fruit)
AS
(
SELECT 1,'Apple' UNION ALL
SELECT 2,'Orange' UNION ALL
SELECT 3,'Pear' UNION ALL
SELECT 4,'Grape' UNION ALL
SELECT 3,'Pear' UNION ALL
SELECT 5,'Banana' UNION ALL
SELECT 6,'Lime'
)
SELECT
STUFF((SELECT DISTINCT ','+s.Fruit AS [text()]
FROM
SampleData AS s
FOR XML PATH('')
), 1, 1, '' )
Β
July 24, 2013 at 4:38 pm
Erland Sommarskog (7/24/2013)
Alan.B (7/24/2013)
Why is the SELECT@x=@x+ method not guaranteed to work?
Why would it?
See this KB article Pay particular attention to the first sentence under Cause.
I say it would work based on the example I posted (which works). It produces the exact same plan and answer (except for the leading comma) as what Chris Posted which I believe is guaranteed to work. I need to read the article a little more (as well as this one[/url]) but I think it should work just fine.
The article you posted seems to imply that there is a problem applying expressions to members of the ORDER BY clause; I am not using an ORDER BY since the requirement does not call for it. From the article:
Under Symptoms (emph mine):
You may encounter unexpected results when you apply any operators or expressions to the ORDER BY clause of aggregate concatenation queries.
Under Cause:
The correct behavior for an aggregate concatenation query is undefined...
... When an expression is applied to a member of an ORDER BY clause, that resulting column is not exposed in the SELECT list, resulting in undefined behavior.
Under Work Around: (emph mine)
In order to achieve the expected results from an aggregate concatenation query, apply any Transact-SQL function or expression to the columns in the SELECT list rather than in the ORDER BY clause.
I could be wrong but I don't believe this article applies.
-- Itzik Ben-Gan 2001
July 25, 2013 at 1:48 am
The article is certainly contradictive, but I've decided to let it stop with the first sentence:
The correct behavior for an aggregate concatenation query is undefined.
The article then goes out of its way to present scenarios where it may work after all. You should keep in mind that the article was originally published when SQL 2000 was the most recent alternative, and there was not any alternatives.
It's important to understand that just because something works in one specific test, that is no guarantee that it will always work, unless there is documentation to say so. And here the documentation clearly says "undefined".
From SQL 2005 there is FOR XML PATH('') which has a well-defined behaviour. Nevermind that the syntax is clunky and non-intuitive. But this is the way to go if you need to build concatenated lists. (As long as you are not dealing with binary data.)
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
July 25, 2013 at 2:09 am
The discussion is going pretty well and the different approaches mentioned are good. One thing I would like to point out here that all the solutions revolve around the same principle and follow the logical query processing phase in sql server.
1. FROM
2. ON
3. OUTER
4. WHERE
5. GROUP BY
6. CUBE | ROLLUP
7. HAVING
8. SELECT
9. DISTINCT
10 ORDER BY
11. TOP
Pay attention, here GROUP BY comes before SELECT and DISTINCT. Of course this will apply independently to sub queries and virtual table expressions.
Following this I don't see any reason why GROUP BY over aggregate concatenation will not work in any scenario. So far I have never encountered an example of the "undefined". If anyone can post an example precisely explaining this nature mentioned in the KB article then it would be helpful for all.
July 25, 2013 at 2:12 am
ksatpute123 (7/25/2013)
Following this I don't see any reason why GROUP BY over aggregate concatenation will not work in any scenario. So far I have never encountered an example of the "undefined". If anyone can post an example precisely explaining this nature mentioned in the KB article then it would be helpful for all.
If you never look when you cross the road "because there are never any cars in this area, and it has always worked for me", you will eventually be run over by a car.
And that is the whole gist of it. Microsoft are not giving you any guarantees that it will work, and thus you should not use it.
Examples? Yes, I have encountered cases where the result was only one of the rows in the result set. Repro? No, this was with older versions of SQL Server.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
July 25, 2013 at 2:21 am
Even the article focuses discussion around ORDER BY clause. I am not saying that if I have never encountered any issues with it then it is perfect. I want to see a example which precisely explains why it would not work so all of us will know what happens behind the scenes and thus have better understanding of how SQL server works.
July 25, 2013 at 2:39 am
Again:
The correct behavior for an aggregate concatenation query is undefined.
The vendor is not giving any guarantees, and this is the key issue.
Up to SQL Server 6.x GROUP BY implied an ORDER BY. Then Hash Aggregates came along and broke that.
Up to SQL 2000, you could use TOP 100 PERCENT ... ORDER BY in a view definition, and a SELECT from the view was always ordered.
That is, what works today, may not work tomorrow, and Microsoft may just shrug their shoulders.
On the other hand, if FOR XML PATH breaks, you can scream "bug" and they will have to fix it.
[font="Times New Roman"]Erland Sommarskog, SQL Server MVP, www.sommarskog.se[/font]
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply