May 16, 2011 at 11:12 am
I just read through Itzik Ben-Gan's SQL Server article about how to think in sets so that you stop shoehorning non Set Based logic into queries.
I work in sets, in the sense that I don't use cursors. I always use groups.
In the article I'm seeing that sets are totally separate from display logic, such as order.
So if I have a solution, that in my mind has to do with order, is this a limitation in my thinking or is this just a solution that isn't set based and shouldn't be attempted in the database if you want to stick to only Set Based usage?
Example:
You're given a table of people who have 5 hobbies listed and ranked how interested they are in each hobby.
The end result is that you want to get a single list of hobbies for each person into a comma delimited list.
You could also be asked to rank the list according to their interest.
I can think of a dozen ways to get this information, but here is an example:
IF OBJECT_ID('tempdb..#PeopleHobby') IS NOT NULL
DROP TABLE #PeopleHobby
CREATE TABLE #PeopleHobby (
Name VARCHAR(50) NOT NULL
,Hobby VARCHAR(50) NOT NULL
,Interest TINYINT NOT NULL
,HobbyList VARCHAR(MAX) NULL
)
ALTER TABLE #PeopleHobby ADD CONSTRAINT [PK_PeopleHobby] PRIMARY KEY CLUSTERED ([Name] ASC, [Interest] ASC)
-- In SQL 2005 format to run on non 2008 servers
INSERT INTO #PeopleHobby VALUES ('Shawn', 'RPG', 3, NULL)
INSERT INTO #PeopleHobby VALUES ('Shawn', 'Database', 1, NULL)
INSERT INTO #PeopleHobby VALUES ('Shawn', '.Net WinForms', 2, NULL)
INSERT INTO #PeopleHobby VALUES ('Shawn', '.Net MVC', 4, NULL)
INSERT INTO #PeopleHobby VALUES ('Shawn', 'Bad Horror Movies', 5, NULL)
INSERT INTO #PeopleHobby VALUES ('James', 'Bad Horror Movies', 3, NULL)
INSERT INTO #PeopleHobby VALUES ('James', 'Surfing', 1, NULL)
INSERT INTO #PeopleHobby VALUES ('James', '007', 2, NULL)
INSERT INTO #PeopleHobby VALUES ('James', 'Plumbing', 4, NULL)
INSERT INTO #PeopleHobby VALUES ('James', 'Video Games', 5, NULL)
-- PK theoretically sorts the table by name, then interest ranking. This seems to be true in SQL Server, but is not true of Set Based Theory. There should be no dependency expected or given based on ordering.
SELECT * FROM #PeopleHobby
DECLARE @PrevName VARCHAR(50), @HobbyList VARCHAR(MAX)
SELECT @HobbyList = ''
SELECT @PrevName = ''
UPDATE PH SET
@HobbyList = CASE WHEN @PrevName <> Name THEN Hobby ELSE @HobbyList + ', ' + Hobby END
,HobbyList = @HobbyList
,@PrevName = PH.Name
FROM #PeopleHobby PH
SELECT Name, MAX(HobbyList) as HobbyList FROM #PeopleHobby GROUP BY Name
May not be the best or cleanest way of getting things done, but it orders the list in two ways (name and interest/rank).
At one point I thought this was Set Based solution because it's happening all in one pass in a grouping or Set. It's not like a cursor where it goes through one at a time. In fact this sort of code was a replacement for a lot of ordered cursor logic we had in our application and sped things up incredibly.
Based on the definition of Set Based Theory, this is not Set Based. This is 100% dependent on order. In fact if you take out the PK, the order will still generally be by inserted record, but not always. I've had cases where 1 in 20 times it wouldn't be (on SS2005, haven't tested on SS2008).
So whats the conclusion? Should it be that this information can't be gathered fully in a Set Based method and should be passed onto a report/application to handle the ordered logic? Is there a Set Based way of doing this that isn't dependent on the actual order?
I understand that sometimes you just might have to get things done in a query and I'm not going to turn into a Set Based purist who will outright ban TSQL that doesn't fit into my standard, but I'm always trying to learn new tricks and make things more efficient. If everything I need to be done can be done in a Set Based query(s) and this is overall better for the database, then I'd like to learn about it.
Thanks for any input you might have 🙂
May 16, 2011 at 11:30 am
PK theoretically sorts the table by name, then interest ranking.
This is not true. There is 1 and ONLY 1 way to order results from sql server and that is by using an Order by clause. I did not look into the rest of the code you posted but the idea of an index ordering your results is incorrect. This has been debated time and time again on these forums.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
May 16, 2011 at 12:00 pm
Actually, the thing to look at is: If it's order-dependent, is it a good idea to try to make it "set based"?
Sets, in relational theory, are naturally unordered data. Any "order" in them is being applied by an external factor, such as the Order By clause in a query, or by a human mind that's used to sorting things (hierarchies, alphabets, numbers, et al).
So, if the order matters, don't think "set based". Think "most efficient means possible".
In this particular case, try this as a way to make this type of query more efficient:
CREATE TABLE #People (
Name VARCHAR(50) PRIMARY KEY);
INSERT INTO #People
(Name)
VALUES ('James'),('Shawn'); -- Will require two inserts in 2005, this is a 2008 trick
SELECT Name,
STUFF((SELECT ', ' + Hobby
FROM #PeopleHobby
WHERE #PeopleHobby.Name = #People.Name
ORDER BY Interest
FOR XML PATH('')), 1, 2, '') AS HobbyList
FROM #People;
What I've done is convert the query to a FOR XML query. The zero-length string in the PATH argument and the unnamed column, get rid of the usual XML formatting. I use STUFF to remove the first comma+space, and you end up with a nicely formatted, ordered, delimited string. I had to add a People table, but you'd normally have that in a normalized database anyway.
This one doesn't take advantage of any "hidden" characteristics, such as index ordering. It is a documented feature (MSDN has examples using this, and the features were built into XML PATH to do this exact kind of thing).
Is it set-based? Not really. It's based on coding techniques in T-SQL.
Use set-based code in your databases where what you're doing is working on sets. When you're not, don't. And order-dependent operations are inherently not "set based".
The only valid reason to use set-based code in SQL Server is that's it's usually the most efficient, effective, and fastest way to get the job done. Don't try to force some "looks like set-based" operation in there if it violates that rule.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 16, 2011 at 3:13 pm
Thank you for the responses.
The idea of Set Based first, Efficient second seems to be along my way of thinking, I just want to make sure I recognize where what I interpret as non set based (requiring ordered logic for example) is just my of thinking limiting me from thinking Set Based or when another solution really is just the best answer.
I've used the ForXML before and actually like it for the simplicity with great performance. I did this other method, which I've used just to show what I would have determined to be a Set Based solution before reading the article which does a good job at showing procedural logic applied to a set.
I've done more complex things passing variables between records that also depend on order that I'm not sure the ForXML would handle (it's not just concatenation). It does seem more like a hacky way to use TSQL, but I can't ignore the performance gains over a Cursor where there is no other alternative known to me at the time. I'll post one of the more complex issues sometime I come across it and see if anyone recognizes a better pattern to solve the issue.
May 16, 2011 at 3:42 pm
It is important to keep set based solutions in mind, but I believe the emphasis on set based solutions is for people like myself who came from a background in older DBMS like FoxPro. I had to make the transition, but not everyone starts out thinking of solutions as row by (agonizing) row.
Great answer GSquared. I also learned something about SQL 2008 in your answer - a better insert method. Thanks! (Too bad I don't use 2008 yet, but maybe someday soon...) 🙂
The greatest enemy of knowledge is not ignorance, it is the illusion of knowledge. - Stephen Hawking
May 18, 2011 at 9:07 am
The way to move your thinking from rows to sets is "stop thinking about what you want to do to a row, start thinking about what you want to do to a column" (I think I stole that from Jeff Moden). That'll make a big difference in how readily you move to set-based logic.
On the 2008 feature, it can be used almost anywhere you could use a SELECT statement, not just inserts. Comes in handy in a number of settings. They're called "Table Value Constructors", and details are here: http://msdn.microsoft.com/en-us/library/dd776382.aspx
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply