TSQL help needed

Question

TSQL help needed

Nirmal-654077

SSC Veteran

Points: 213
More actions
April 11, 2009 at 5:00 pm

#203372

Hi I need some help in programing the below scenario, can Some one help me.
Source Table Structure
Resource_Id Skill_Id
101 sqlserver
101 vb.net
101 oracle
102 sqlserver
102 java
102 excel
103 vb.net
103 java
103 oracle
I need the out put in the below format, I tried some methods , but they are taking more than expected time , Performance is the key factor .
Final Output
101 sqlserver vb.net oracle
102 sqlserver ---- ---- java excel
103 ---- vb.net oracle java

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 1

See if these are quick enough for you. The use of FOR XML is the fastest concatenation technique I've come across, but this sort of string manipulation is often better accomplished by the calling application outside SQL.

The second, CrossTab, technique can be modified to produce the output you wanted; I just thought it was easier to read this way.

declare @skills table (Resource_Id int, Skill_Id varchar(20))

insert into @skills

select 101, 'sqlserver' union all

select 101, 'vb.net' union all

select 101, 'oracle' union all

select 102, 'sqlserver' union all

select 102, 'java' union all

select 102, 'excel' union all

select 103, 'vb.net' union all

select 103, 'java' union all

select 103, 'oracle'

--

select * from @skills s1

-- Concatenated Format

set statistics time on;

SELECT Resource_Id,stuff((SELECT ',' + Skill_Id

FROM @skills s2

WHERE s2.Resource_Id= s1.resource_ID -- must match GROUP BY below

ORDER BY Skill_Id

FOR XML PATH('')

),1,1,'') as [Skills]

FROM @skills s1

GROUP BY s1.Resource_Id -- without GROUP BY multiple rows are returned

ORDER BY s1.Resource_Id

set statistics time off;

-- CrossTab Format

SELECT Resource_Id

,MAX(case when skill_id = 'Excel' then 'Yes' else '' end) as Excel

,MAX(case when skill_id = 'Java' then 'Yes' else '' end) as Java

,MAX(case when skill_id = 'Oracle' then 'Yes' else '' end) as Oracle

,MAX(case when skill_id = 'SQLServer' then 'Yes' else '' end) as SQLServer

,MAX(case when skill_id = 'VB.Net' then 'Yes' else '' end) as [VB.Net]

FROM @skills

Group by Resource_Id

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 2

Bob Hovious (4/11/2009)
See if these are quick enough for you. The use of FOR XML is the fastest concatenation technique I've come across, but this sort of string manipulation is often better accomplished by the calling application outside SQL.

Like everything else, it depends.

Test setup... (code window doubled up the word "ADD" and will need repair when you copy it)

USE TempDB

GO

--===== Create and populate a 1,000,000 row test table.

-- Column "RowNum" has a range of 1 to 100,000 unique numbers

-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers

-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings

-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers

-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'

-- for all rows.

-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)

SELECT TOP 1000000

SomeID = IDENTITY(INT,1,1),

SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,

SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)

+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),

SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),

SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),

SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

SomeHex12 = RIGHT(NEWID(),12)

INTO dbo.JBMTest

FROM Master.dbo.SysColumns t1

CROSS JOIN

Master.dbo.SysColumns t2

GO

--===== Add a Clustered PK

ALTER TABLE dbo.JBMTest

ADD PRIMARY KEY CLUSTERED (SomeID)

GO

--===== Add an index to support the concatenation

CREATE INDEX IX_JBMTest_SomeInt ON dbo.JBMTest (SomeInt,SomeLetters2)

GO

--===== Create a function that uses VARCHAR(8000)

CREATE FUNCTION dbo.Concat8KTest

(@SomeInt INT)

RETURNS VARCHAR(8000)

AS

BEGIN

DECLARE @ReturnString VARCHAR(8000)

SELECT @ReturnString = ISNULL(@ReturnString+',' ,'') + SomeLetters2

FROM dbo.JBMTest

WHERE SomeInt = @SomeInt

ORDER BY SomeLetters2

RETURN @ReturnString

END

GO

--===== Create a function that uses VARCHAR(MAX)

CREATE FUNCTION dbo.ConcatMAXTest

(@SomeInt INT)

RETURNS VARCHAR(MAX)

AS

BEGIN

DECLARE @ReturnString VARCHAR(MAX)

SELECT @ReturnString = ISNULL(@ReturnString+',' ,'') + SomeLetters2

FROM dbo.JBMTest

WHERE SomeInt = @SomeInt

ORDER BY SomeLetters2

RETURN @ReturnString

END

GO

Test 8k, MAX, and XML methods...

--===== Test the 8k solution

SELECT SomeInt,dbo.ConcatTest(SomeInt)

FROM dbo.JBMTest

GROUP BY SomeInt

ORDER BY SomeInt

GO

--===== Test the MAX solution

SELECT SomeInt,dbo.ConcatTest(SomeInt)

FROM dbo.JBMTest

GROUP BY SomeInt

ORDER BY SomeInt

GO

--===== Test the XML solution

SELECT t1.SomeInt,STUFF((SELECT ',' + t2.SomeLetters2

FROM dbo.JBMTest t2

WHERE t2.SomeInt = t1.SomeInt --- must match GROUP BY below

ORDER BY t2.SomeLetters2

FOR XML PATH(''))

,1,1,'') AS SomeLetters2

FROM dbo.JBMTest t1

GROUP BY t1.SomeInt --- without GROUP BY multiple rows are returned

ORDER BY t1.SomeInt

GO

Peformance and resource results...

Passing only 50,000 rows to the app is a heck of a lot cheaper than clogging the pipe with a million rows for the app to do it. And, it looks to me like the XML method is not only slower for total duration, but it's almost twice as hard on the CPU as a good ol' fashioned concatenation function.

Of course, for any of these methods, you need the correct index if you don't want to wait a week for it to finish.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 3

Jeff you shoulda been a lawyer :-D. Wasn't it you asking why people needed to do this on another thread? I'm in total agreement that passing summary data takes a huge load off the system. But I thought I was on the other side of this dispute the last time you schooled me.

Yes it depends and, given your index, the simple concatenation function works faster. But, what if the index doesn't exist in the OPs environment? The XML concatenation still runs in under a minute for your million row table, and I gave up on the concatenation function after it ran for over 5. Also, I couldn't duplicate your cpu numbers. On my machine, the function with Max took over 12,000 ms while the XML ran in the 3300-3900 range, and the 8k function ran around 3100-3300. (The XML was the clear loser on the elapsed time though due to I/O, taking almost twice as long as the concatenation functions, The functions could get the job done with a single scan of the nonclustered index and a fraction of the logical reads.)

I have a question about the nonclustered index. You are depending on it for the sequencing of the someletters combinations, so you avoid having to do an ORDER BY anywhere but on the someint column. But I thought you or Gail said that you couldn't count on anything but a clustered index or an ORDER BY clause to maintain sequence. Did I misunderstand?

Finally, since it depends and given that the OP's question concerned a very specific set of values, couldn't we have also used this approach? It results in CPU and elapsed times under 1850 ms, with the nonclustered index in place.

set statistics time on;

;with CrossTab as (

SELECT someInt

,MAX(case when someLetters2 = 'AA' then ',AA' else '' end) as AA

,MAX(case when someletters2 = 'BB' then ',BB' else '' end) as BB

,MAX(case when someletters2 = 'CC' then ',CC' else '' end) as CC

,MAX(case when someletters2 = 'DD' then ',DD' else '' end) as DD

,MAX(case when someletters2 = 'EE' then ',EE' else '' end) as EE

FROM JBMTest

Group by someInt

)

select isnull(STUFF(AA+BB+CC+DD+EE,1,1,''),'')

from CrossTab

set statistics time off;

Thanks again for furthering my education! 🙂

Bob

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Jeff Moden SSC Guru Points: 1003853 More actions · Answer 4

Bob Hovious (4/12/2009)
Jeff you shoulda been a lawyer :-D. Wasn't it you asking why people needed to do this on another thread? I'm in total agreement that passing summary data takes a huge load off the system. But I thought I was on the other side of this dispute the last time you schooled me.

Heh... I guess you're right about that. The problem with concatenation is that it's not really summary data no matter how you handle it. Bloody concatenation can be justified either way depending on whether you're IO bound or CPU bound. My normal outrage about the subject is that it's very costly either way and I really want to know why any given OP wants to do it to begin with. Either you're gonna send all of the data with a bunch of CrLf's or all of the data with a bunch of commas. Both the server and the client are going to have to work at processing it and the pipe will suffer one way or the other. I pretty much hate the idea of concatenation no matter where it's done.

Yes it depends and, given your index, the simple concatenation function works faster. But, what if the index doesn't exist in the OPs environment? The XML concatenation still runs in under a minute for your million row table, and I gave up on the concatenation function after it ran for over 5. Also, I couldn't duplicate your cpu numbers. On my machine, the function with Max took over 12,000 ms while the XML ran in the 3300-3900 range, and the 8k function ran around 3100-3300. (The XML was the clear loser on the elapsed time though due to I/O, taking almost twice as long as the concatenation functions, The functions could get the job done with a single scan of the nonclustered index and a fraction of the logical reads.)

That's the other reason I don't care for concatenation. I don't understand all of the reasons for it, but the process varies greatly from machine to machine.

So far as the possibility of the missing index goes, if I had a million row table and considering that the fastest method returns in a whopping minute without it, you can be darned sure I'd bust a hump or a DBA's head to ensure the index was placed. 🙂 It's all part of performance tuning, etc.

I have a question about the nonclustered index. You are depending on it for the sequencing of the someletters combinations, so you avoid having to do an ORDER BY anywhere but on the someint column. But I thought you or Gail said that you couldn't count on anything but a clustered index or an ORDER BY clause to maintain sequence. Did I misunderstand?

Nope... I don't see where I left out any ORDER BY's for SomeLetters2... both functions contain it and so does the XML. And, yes, contrary to popular belief, it's not safe to depend on a clustered index as an ORDER BY in Selects... just the "quirky update". That's part of the reason why I'm rewriting the "Running Total" article.

Finally, since it depends and given that the OP's question concerned a very specific set of values, couldn't we have also used this approach? It results in CPU and elapsed times under 1850 ms, with the nonclustered index in place.

Absolutely. But what happens if someone adds another "category" to the table? The code has no warning that it's not picking everything up. And people who say another category will never be added always live long enough to find out they're eventually wrong. 😉

Because of my rants on performance and scalability, many think that speed is my only goal. They don't realize that accuracy and data integrity is always my first goal. I'd gladly have something take 3-5 seconds on a million row table and never have to worry about it producing the wrong answer if a new category is added than to run in half that time with the eventual potential for a wrong answer.

To wit, my mantra is "Make it work, make it fast, make it pretty... and it ain't done 'til it's pretty."

Thanks again for furthering my education! 🙂

The feeling is more than mutual. Whenever someone like yourself asks such good questions, I always have to go back and re-examine the reasons for any convictions I may have. Normally, if a conviction cannot be backed up with code, it's not worth being a conviction.

I sure do hope I answered your questions and concerns on this one. Concatenation is always difficult to defend no matter how you do it.

Thanks for the feedback, Bob.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 5

Sorry about raising the ORDER BY issue. I threw that in while looking at the tests and just had a brain-short not thinking about the functions themselves.

You're absolutely right about the problem with the crossTab being what to do when the set of possible values increases. The only play there is to do it in dynamic SQL. Hopefully, there would be a table somewhere that limits the valid codes that could be entered, or at least an index for a quick SELECT DISTINCT. Possibly a trigger could be written to rebuild a stored procedure or table-valued function whenever the content of that table changes. (I am sometimes amazed that the dba's and security people let me write code that writes code. Haven't they ever heard of Skynet?)

The only justification I can give you for concatenation is that when people are looking at data, it allows a lot more to fit on a page where they can visually scan it. Every time I've had a request for it, it's been in the context of a report of some kind. I am still keeping the FOR XML technique in the toolbox, but it is immediately followed by your code from above, because with the proper index, it's hard to beat.

Jeff, thanks for taking the time to listen to my questions and follow up. I invariably learn something worthwhile from the exchange. It's a pleasure to be able to ask questions and discuss competing considerations without it being all about egos. Now I have to go look at Flo's CLR parsing.

(P.S. Florian is a guy's name.) 😛

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 6

Bob Hovious (4/12/2009)
But I thought you or Gail said that you couldn't count on anything but a clustered index or an ORDER BY clause to maintain sequence.

Wouldn't have been me. I tell people that you can't depend on anything other than an ORDER BY to maintain sequence. The update is a side-effect of how the engine currently works, but there's no guarantee that someone won't find another edge case next week where it breaks.

The only time the optimiser will care about the order rows are returned in is when there's an ORDER BY on the query.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 7

Thanks, Gail. I was sure about the ORDER BY, but not the clustered index. I couldn't find any of the threads where either was discussed.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Vijaya Kadiyala SSCrazy Eights Points: 9119 More actions · Answer 8

Jeff,

wow..its simply great to see the way you are thinking on the questions and answers...

Thanks -- Vijaya Kadiyala
www.dotnetvj.com
SQL Server Articles For Beginers

JacekO SSCertifiable Points: 6610 More actions · Answer 9

I think Nirmal might need something like this

declare @skills table (Resource_Id int, Skill_Id varchar(20))

insert into @skills

select 101, 'sqlserver' union all

select 101, 'vb.net' union all

select 101, 'oracle' union all

select 102, 'sqlserver' union all

select 102, 'java' union all

select 102, 'excel' union all

select 103, 'vb.net' union all

select 103, 'java' union all

select 103, 'oracle'

SELECT Resource_Id, ISNULL(sqlserver, '----'), ISNULL([vb.net], '----'), ISNULL(oracle,'----'), ISNULL(java, '----'), ISNULL(excel, '----')

FROM @skills

PIVOT(MAX(Skill_Id) FOR Skill_Id

IN(sqlserver,[vb.net],oracle,java,excel)) AS P;

If you look carefully at his sample output the data is not just concatenated but sorted as well and when a skill is missing you have four dashes. Of course this solution is going to work with a finite number of skills only. The output generated by the code above is not exactly matching the sample output because there are trailing dashes for skills further down the list.

PS. Bob , thanks for the insert code....

---------------------------------------------
[font="Verdana"]Nothing is impossible.
It is just a matter of time and money.[/font]

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 10

VK... my observation is that there is a world of valuable information to be learned from the people who hang around SSC. I shamelessly take every opportunity to pick their brains.

Jacek0... you're welcome, I think.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

GSquared SSC Guru Points: 260824 More actions · Answer 11

Jeff, in your posted test harness, you need to rename the functions in the final tests. It has "SELECT SomeInt, dbo.ConcatTest...", but it needs to be Concat8kTest and ConcatMaxTest, per the create scripts. No big deal, but you might want to edit the post accordingly.

Also, I got very different CPU and total time readings than you on those. Average for the 8k UDF was 5300 milliseconds CPU, 9200 total; Max UDF was 8500 CPU, 13000 total; XML inline was 2000 CPU, 2000 total.

The XML version's IO stats are out the roof, with 154k logical reads, compared to 1985 for each of the other two, but that didn't slow it down apparently. On my system, anyway.

Of course, since the test data is pretty much random, the actual number of reads and scans will vary, as it depends on the number of unique SomeInt values and how many rows each of those has in the table.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 12

Just to test it a bit further:

CREATE FUNCTION dbo.ConcatXMLTest (@SomeInt_in INT)

RETURNS VARCHAR(8000)

AS BEGIN

RETURN (SELECT

STUFF((SELECT

',' + t2.SomeLetters2

FROM

dbo.JBMTest t2

WHERE

t2.SomeInt = @SomeInt_in

ORDER BY

t2.SomeLetters2

FOR

XML PATH(''),type).value('.[1]', 'varchar(MAX)'), 1, 1, ''))

END ;

SELECT

SomeInt,

dbo.ConcatXMLTest(SomeInt)

FROM

dbo.JBMTest

GROUP BY

SomeInt

ORDER BY

SomeInt ;

That got the scan count down to 1 and the reads to 1985, but took the CPU up to over 10 seconds and total to nearly 14 seconds (up from 2 for each for the inline version).

Adding "type" and the value function to the inline XML query increased its run-time to 3800 milliseconds, but allows the string values to contain characters that are XML-specific, like "&", which will otherwise cause it to error out.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 13

Just for one more test, I took the type and value functions out of the XML version of the UDF, and that took it down to 6.5 seconds CPU and 9.5 seconds total. Same scan and read stats either way, of course.

That's slightly worse than the 8k string function, and better than the Max string function, so I modified the XML UDF one more time to make it return varchar(max), and that took it up to 7 seconds CPU, 10 seconds total. No effect on IO. That's better than the Max string function still.

With both the type and value functions, and returning varchar(max), the XML UDF went up to nearly 11 seconds CPU and nearly 14 seconds total, which makes it the worst version of them all!

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 14

Gus, I rewrote your function as an inline table valued function and it gets the job done in half the time of the ConcatMaxTest function, but still three times longer than the Concat8KTest function.

There is just no beating the IO advantage of scanning the nonclustered index a single time.

ALTER FUNCTION [dbo].[itvfConcatXML]

(

@someInt int

)

RETURNS TABLE

AS

RETURN

(

SELECT STUFF((SELECT ',' + t2.SomeLetters2

FROM dbo.JBMTest t2

WHERE t2.SomeInt = @SomeInt

ORDER BY t2.SomeLetters2

FOR XML PATH(''),type).value('.[1]', 'varchar(MAX)'), 1, 1, '') as conCatString

)

I used a CTE to feed it a distinct list.

;with someints as (select distinct someint from jbmtest)

SELECT t1.SomeInt,f.ConCatString

FROM someints t1

cross apply dbo.itvfConcatXML(t1.someint) f

ORDER BY t1.SomeInt

Table 'JBMTest'. Scan count 50001, logical reads 154059, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 6864 ms, elapsed time = 6901 ms.

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills