I would like to get the distinct value from specified column. I would appreciate if any one provide the answer of this

Question

I would like to get the distinct value from specified column. I would appreciate if any one provide the answer of this

Viewing 7 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 1

WayneS (5/27/2011)
Sean Lange (5/27/2011)
ohhh this is starting to sound like a show down. This is going to be a good one to watch!!! Ruban sit back and watch the masters go at it. This has the markings of some pretty cool learning for the rest of us. 😉
:blush: Master? Not me... Jeff maybe.. but thanks for the vote of confidence!

Nope... thank you both for the compliment but not me. I'm not a Master at anything. I just get good and lucky now and again.

Well, as is usually the case, the performance of Jeff's code will usually blow mine out of the park. But, I thoroughly agree with the learning part!

I think that if you tweak the accidental cross join out of it, your's will be just as fast.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95461 More actions · Answer 2

Jeff Moden (5/27/2011)
WayneS (5/27/2011)
Sean Lange (5/27/2011)
ohhh this is starting to sound like a show down. This is going to be a good one to watch!!! Ruban sit back and watch the masters go at it. This has the markings of some pretty cool learning for the rest of us. 😉
:blush: Master? Not me... Jeff maybe.. but thanks for the vote of confidence!
Nope... thank you both for the compliment but not me. I'm not a Master at anything. I just get good and lucky now and again.

Then I think that there is something pretty masterful about someone that gets good and lucky that frequently.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95461 More actions · Answer 3

Okay, I converted my earlier code to run with the sample table that Jeff created (removed one cte to create the RN column; renamed RN to SomeID throughout):

;

WITH cte AS

(

-- Break apart all of the names on the space.

-- The latest version of the Delimited Split Function is at http://www.sqlservercentral.com/articles/Tally+Table/72993/.

SELECT *

FROM tempdb.dbo.SomeTable t

CROSS APPLY dbo.DelimitedSplit8K(t.name_1, ' ') ds

), cte3 AS

(

-- Group the results by the name,

-- Get the lowest item for each name part.

SELECT NAME_1,

SomeID,

Item,

ItemNumber = MIN(ItemNumber)

FROM cte

GROUP BY NAME_1, SomeID, Item

)

-- Put the remaining results back together

SELECT t.NAME_1,

ColList = STUFF((SELECT ' ' + t2.Item

FROM cte3 t2

WHERE t.SomeID = t2.SomeID

ORDER BY t2.SomeID, t2.ItemNumber

FOR XML PATH(''),TYPE).value('.','varchar(max)'),1,1,'')

FROM tempdb.dbo.SomeTable t

ORDER BY t.SomeID;

It produces the following statistics (with "SET STATISTICS IO,TIME ON;":

Table 'SomeTable'. Scan count 1, logical reads 40090, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 2106 ms, elapsed time = 2530 ms.

Wayne's test, elapsed time = 2530 ms

Then, I compared Jeff's code. Statistics from his:

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 2 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 5 ms.

Table 'SomeTable'. Scan count 3, logical reads 433, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 779 ms, elapsed time = 754 ms.

(46131 row(s) affected)

SQL Server parse and compile time:

CPU time = 45 ms, elapsed time = 45 ms.

Table '#MyHead'. Scan count 3, logical reads 173, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 32 ms, elapsed time = 98 ms.

SQL Server Execution Times:

CPU time = 79 ms, elapsed time = 146 ms.

Table '#MyHead'. Scan count 19972, logical reads 40191, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'SomeTable'. Scan count 1, logical reads 146, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 967 ms, elapsed time = 1013 ms.

(12311 row(s) affected)

Table '#ForReview'. Scan count 1, logical reads 124, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 31 ms, elapsed time = 359 ms.

Jeff's Test, elapsed time: 2290 ms

Jeff's was 240ms faster for this test data - but had slightly higher reads. Essentially a draw.

My conclusion: On the test data, either solution will suffice. Test both to see which works better for you. The main things to watch out for is that with slightly higher IO, Jeff's might start slowing down once it is scaled up to millions of rows. Mine isn't utilizing "divide and conquer" techniques, which when scaled to millions of rows might cause increased activity.

One other thing to note: the execution plan for Jeff's code shows parallelism going on - if you're on a server with lots of cores, this could make a drastic difference also. Something else to test. (Jeff's 9yr old P4 wouldn't show this.) (My code doesn't have any parallelism going on.)

Please, respond back to let us know how it goes... there's some curious folks around here that want to know! (Don't forget to answer Jeff's earlier questions also - this may make a difference.)

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 4

Your modification worked a real treat, Wayne... it used to take almost 7 mimutes for your previous code to run against the test data on my box. Now, it only takes 10 seconds. Well done.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 5

Here's the profiler run between my and Wayne's code on my old box. I guess I'm going to have to try it on a more modern machine and see what's up. I'm running 2k5 on my old box and 2k8 on the newer one. I'm not sure that's apples and apples.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95461 More actions · Answer 6

Here's the profiler run on my more modern computer between my code and Jeff's code:

The things that I'm curious about:

1. Why aren't the # of reads between runs on your code consistent?

2. Why aren't the # of writes between runs on your code consistent?

About my mod speeding it up... all I did was take out the row_number() assignment in the first cte (and changed RN to SomeID throughout to be compatible with your test table). Well, hang on... I did delete the cross apply, and changed it to do the stuff(for xml ) stuff in the column assignment - I bet that's what the difference is. I guess that is what you're calling the "accidental cross join"?

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 7

I don't know for sure but I'd almost bet the number of reads and writes being different from run to run are because I'm writing to disk and SQL Server decides what the best way to do that is. It won't always choose the same pages/extents.

For the accidental cross-join, I didn't actually troubleshoot what was causing it. I only know I saw it in the Execution Plan.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)