Is there a best way to combine data into a single output field

Question

Is there a best way to combine data into a single output field

Rod Harten

SSC-Addicted

Points: 476
More actions
April 28, 2017 at 6:38 pm

#321098

I need to create output that takes a list of names and combines them into one field. Here is an example of the output:Name States
Bob Ohio, Iowa, Alaska
Sue Idaho, Iowa
Jane Alaska, Hawaii, Idaho, Ohio
The tables are well-formed (see below) and I can brute force a solution but am curious if there is a better, preferred way. I have two questions.
1) Is there a recommended way for creating this sort of output?
2) Is there a name for this sort of process? I can't think of one, which makes it difficult to search for a recommended solution.
Thank you for your time and attention,
Rod
--===============================================================
CREATE TABLE #names (
    NameID int,
    name varchar(4));
INSERT INTO #names
VALUES (1, 'Bob'), (2, 'Sue'), (3, 'Jane');
----
CREATE TABLE #states (
StateID int,
    State varchar(6));
INSERT INTO #states
VALUES (1, 'Ohio'), (2, 'Iowa'), (3, 'Hawaii'), (4, 'Alaska'), (5, 'Idaho');
----
CREATE TABLE #name_states (
NameID int,
    StateID int);
INSERT INTO #name_states
VALUES (1, 1), (1, 2), (1, 4), (2, 5), (2, 2), (3, 4), (3, 3), (3, 5), (3, 1);

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

Rod Harten - Friday, April 28, 2017 6:38 PM
I need to create output that takes a list of names and combines them into one field. Here is an example of the output:Name States
Bob Ohio, Iowa, Alaska
Sue Idaho, Iowa
Jane Alaska, Hawaii, Idaho, Ohio
The tables are well-formed (see below) and I can brute force a solution but am curious if there is a better, preferred way. I have two questions.
1) Is there a recommended way for creating this sort of output?
2) Is there a name for this sort of process? I can't think of one, which makes it difficult to search for a recommended solution.
Thank you for your time and attention,
Rod
--===============================================================
CREATE TABLE #names (
    NameID int,
    name varchar(4));
INSERT INTO #names
VALUES (1, 'Bob'), (2, 'Sue'), (3, 'Jane');
----
CREATE TABLE #states (
StateID int,
    State varchar(6));
INSERT INTO #states
VALUES (1, 'Ohio'), (2, 'Iowa'), (3, 'Hawaii'), (4, 'Alaska'), (5, 'Idaho');
----
CREATE TABLE #name_states (
NameID int,
    StateID int);
INSERT INTO #name_states
VALUES (1, 1), (1, 2), (1, 4), (2, 5), (2, 2), (3, 4), (3, 3), (3, 5), (3, 1);

I think this will give you what you're looking for.

WITH cteData AS ( SELECT n.Name, s.State FROM #names n INNER JOIN #name_states ns ON ns.NameID = n.NameID INNER JOIN #states s ON ns.StateID = s.StateID ), cteNames AS ( SELECT DISTINCT Name FROM #names ) SELECT Name, StatesList = STUFF((SELECT ',' + d.State FROM cteData d WHERE d.Name = cteNames.Name ORDER BY d.State FOR XML PATH(''), TYPE).value('.', 'varchar(max)'), 1, 1, '') FROM cteNames ORDER BY Name;

There are many approaches to creating a delimited list, but Wayne Sheffield's article at http://www.sqlservercentral.com/articles/comma+separated+list/71700/ is the approach I prefer. This is the technique above.

Edit: Please forgive the indenting of the code, but I still haven't gotten the knack of the new SQL editor on SSC. I tried to get it close.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Rod Harten SSC-Addicted Points: 476 More actions · Answer 2

Thank you very much!

That works great and is a much better solution than anything I was considering.

Thanks also for the link to the article and giving a name, comma separated list, to what I an trying to create,

Rod

Ed Wagner SSC Guru Points: 287026 More actions · Answer 3

Rod Harten - Sunday, April 30, 2017 5:32 PM
Thank you very much!
That works great and is a much better solution than anything I was considering.
Thanks also for the link to the article and giving a name, comma separated list, to what I an trying to create,
Rod

I'm glad it worked well for you. Thanks for your feedback.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

jcelko212 32090 SSCrazy Eights Points: 9295 More actions · Answer 4

Rod Harten - Friday, April 28, 2017 6:38 PM

I need to create output that takes a list of names and combines them into one field. Here is an example of the output:

There's so much fundamentally wrong here. You might want to look up what the term "field" means in SQL. You also might want to look at what a column is; a column has to be scaler not a list. By definition. Then you might want to look at what a table is; what you posted has no keys. In fact, what you posted can never have a key, because all the columns are NULL-able.

On top of all of this, identifiers are never numerics because you don't do any computations with them (this is usually covered in the first week of any good data modeling course); what would the square root of your credit card number mean?

The model in modern computing is a tiered architecture. Data is scrubbed in an input layer, past into the database layer, and then queried from it so it can be passed to a presentation layer. What you're trying to do is best done in a presentation layer, not in SQL. The way to do this is by getting out of SQL completely and doing a kludge with XML or passing it on to report writer of some kind.

Please post DDL and follow ANSI/ISO standards when asking for help.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 5

jcelko212 32090 - Monday, May 1, 2017 8:22 AM
There's so much fundamentally wrong here. You might want to look up what the term "field" means in SQL. You also might want to look at what a column is; a column has to be scaler not a list. By definition. Then you might want to look at what a table is; what you posted has no keys. In fact, what you posted can never have a key, because all the columns are NULL-able.
On top of all of this, identifiers are never numerics because you don't do any computations with them (this is usually covered in the first week of any good data modeling course); what would the square root of your credit card number mean?
The model in modern computing is a tiered architecture. Data is scrubbed in an input layer, past into the database layer, and then queried from it so it can be passed to a presentation layer. What you're trying to do is best done in a presentation layer, not in SQL. The way to do this is by getting out of SQL completely and doing a kludge with XML or passing it on to report writer of some kind.

There's no need to go completely OCD and berate the OP on what terms they used. I looked at what the OP was trying to do. After all, this site is supposed to be about helping others.

Oh, an identifiers can be numeric. Using an integer identity column as an artificial key has many benefits that are well beyond the scope of this post. It may or not be ISO-compliant, but it doesn't matter because this is a SQL Server forum, not an ISO standards forum. There's no question that you know the ISO standards, but many people don't consider them a religion.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

jcelko212 32090 SSCrazy Eights Points: 9295 More actions · Answer 6

Ed Wagner - Monday, May 1, 2017 9:38 AM

jcelko212 32090 - Monday, May 1, 2017 8:22 AM
>> Oh, an identifier can be numeric. <<
Sorry, but that's wrong. Numerics record magnitude and quantity. They are not identifiers. Now there are tag numbers (look it up in a book on data modeling) which are strings of digits that can be validated and verified. The one you probably got in your pocket right now your credit card numbers. But your credit card number is not a quantity or magnitude of any kind! I like to ask people have this confusion what they think the square root of their credit card number means.
>> Using an integer identity column [sic] as an artificial key [sic] has many benefits that are well beyond the scope of this post.<<
Sorry wrong again. The proprietary old Sybase IDENTITY is not a column at all; it's a table property. That's why you can have more than one of them on the table. It sequentially numbers the physical record insertion attempts. This was taken from UNIX and the original tape file model used on the original hardware. If your old enough, you might remember that cursors (really tape file operations) could only read forward in a table in T-SQL. The first hardware for Mac tapes had that limit and it was built into the first releases of language. This hasn't been true for decades now, but you'll still see that mindset carrying on.
You might also want to read a little bit of Dr. Codd stuff about what an artificial key is. It has nothing to do with the physical storage locations.
>> It may or not be ISO-compliant, but it doesn't matter because this is a SQL Server forum, not an ISO standards forum. <<
Yes, I do believe that standards always matter. A competent, professional programmer will try to write to the industry standards and not deliberately do what we used to call "job secure programming" in the old days. This was code that was so dependent on a particular release of a particular product, usually made even worse by poor naming conventions and highly personalized code. Decades ago, I worked on a COBOL system for the programmer not only took advantage of every little trick in IBM's COBOL product at the time, but named his paragraphs based on a theme. We had an accounting program that use the names of countries. This let you say things like "TO GO Afghanistan" or "PERFORM Switzerland" in your code and guaranteeing that nobody would understand what the hell is going on.
>> There's no question that you know the ISO standards, but many people don't consider them a religion. <<
Would you drive a car built by a company whose attitude toward all the safety standards was like yours? After all, you don't want anybody being religious about safety, medical care or any of that other crap, do you? 🙂

Please post DDL and follow ANSI/ISO standards when asking for help.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 7

Joe, really getting tired of the your bullying on here. If you want to help, try providing something constructive and helpful instead of constantly telling everyone how wrong they are and how right you are.

I say that having seen some of the code you posted in the past that DIDN'T work when cut/paste/executed in SSMS, and when called on it you ignored it.

I will say it again, you have a vast wealth of knowledge that you could share but you don't seem willing to really share it except in your books, that I refuse to purchase because of your online persona.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 8

Because the only text that you are dealing with is alphanumeric, you can change
FOR XML PATH(''), TYPE).value('.', 'varchar(max)'), 1, 1, '')
to
FOR XML PATH('')), 1, 1, '')
As Wayne talks about in the article Ed posted, the TYPE and value constructs are used to deal with any XML characters (e.g. <>&). They also, however, slow the query down which is why you only include that code when necessary. The optimized code would look like this:

WITH cteData AS ( SELECT n.Name, s.State FROM #names n INNER JOIN #name_states ns ON ns.NameID = n.NameID INNER JOIN #states s ON ns.StateID = s.StateID ), cteNames AS (SELECT DISTINCT Name FROM #names) SELECT Name, StatesList = STUFF((SELECT ',' + d.State FROM cteData d WHERE d.Name = cteNames.Name ORDER BY d.State FOR XML PATH('')), 1, 1, '') FROM cteNames;
This looks like a small change but if you look at the execution plans, the impact is pretty big. Note excluding the TYPE & value clauses removes the calls to the XML Reader TVF in the plan which slows things down.

Lastly - there is a STRING_AGG function available in 2017 & Azure SQL which is far superior to the FOR XML PATH method.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Ed Wagner SSC Guru Points: 287026 More actions · Answer 9

Lynn Pettis - Monday, May 1, 2017 3:41 PM
Joe, really getting tired of the your bullying on here. If you want to help, try providing something constructive and helpful instead of constantly telling everyone how wrong they are and how right you are.
I say that having seen some of the code you posted in the past that DIDN'T work when cut/paste/executed in SSMS, and when called on it you ignored it.
I will say it again, you have a vast wealth of knowledge that you could share but you don't seem willing to really share it except in your books, that I refuse to purchase because of your online persona.

Same here, Lynn.

Joe, I mean this constructively. The angry, pedantic monologues just turn people off. You're punishing people for nothing. When people are berated and receive no actual help, they go away. Worse yet, what's posted doesn't work and people who don't know any better will end up with bad design. If your goal is to be a troll, you've succeeded. If you're trying to be helpful, you might want to try a different approach.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

Ed Wagner SSC Guru Points: 287026 More actions · Answer 10

Alan, I found the same thing. When I tested them against one another (with de-entitization versus without) I found an average of 78.3% reduction in execution time. I included it for safety, but I should have noted it. Thanks for picking up on it.

Also, thank you for the tip about the STRING_AGG function. I will be sure to check that one out.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions