September 27, 2012 at 1:19 pm
This one has had me stumped for the last few hours and at this stage I think I need some help...
I need to compare multiple groups from a single table and to identify where items listed in col B match. For example: -
Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana
I want 'John' and 'David' returned because their items in col B match. Hope this makes sense! Thanks in advance! G
September 27, 2012 at 1:28 pm
Couple of questions:
Is this oversimplified or is it a homework question? If homework, what's the chapter's subject so we can help you find the right place with this?
If it's real data, please see the first link in my signature for proper setup for code assistance. Schema, sample data, expected results, current work to be modified. We should be able to fix you right up from there with minimal back and forth.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2012 at 1:59 pm
I found a way to do this. Might not be optimal but it's a way.
However, I want to know what is this for, as Craig asked.
September 27, 2012 at 2:07 pm
Luis Cazares (9/27/2012)
I found a way to do this. Might not be optimal but it's a way.However, I want to know what is this for, as Craig asked.
There's a few approaches, from row-collapse to after compare rowcount checks. Depends on volume of data, # of items per partition, etc. I just want to know what the OP is actually dealing with.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2012 at 2:54 pm
I would like to know more about those methods. Could you tell me where to find information? I couldn't find much.
This is what I came with. It's doing many reads.
DECLARE @TestTABLE(
colAvarchar(15),
colBvarchar(15))
INSERT @Test VALUES
('John', 'Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('Luis','Orange'),
('Luis','Strawberry'),
('Luis','Apple'),
('Mike','Orange'),
('Mike','Strawberry'),
('Mike','Apple'),
('David','Apple'),
('David','Orange'),
('David','Banana');
WITH CTE AS(
SELECT colA, SUM( value) total_value
FROM @Test a
JOIN (
SELECT colB,
POWER( 2, ROW_NUMBER() OVER( order by colB) - 1 ) AS value
FROM (SELECT DISTINCT ColB FROM @Test)x) b ON a.colB = B.colB
GROUP BY colA
)
SELECT *
FROM @Test
WHERE colA IN(
SELECT a.colA
FROM CTE a
JOIN CTE b ON a.total_value = b.total_value AND a.colA <> b.colA)
September 27, 2012 at 3:18 pm
Luis Cazares (9/27/2012)
I would like to know more about those methods. Could you tell me where to find information? I couldn't find much.This is what I came with. It's doing many reads.
PM sent so it doesn't end up with the OP going off on a few methods that may not work well depending on his schema and indexing.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2012 at 3:27 pm
My apologies - my original post was using over simplified data. My real data has a list of computer names in column A and names of installed applications in column B. I want to find computers that have the same software stack installed. The actual data is very long so here is a cut down version
IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL
DROP TABLE #mytable
CREATE TABLE #MyTable(
[ComputerName] [nvarchar](255) NULL,
[ApplicationName] [nvarchar](255) NULL
)
INSERT INTO #MyTable
(ComputerName, ApplicationName)
SELECT 'DCZC445645S','1E NomadBranch 3' UNION ALL
SELECT 'DCZC445645S','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC445645S','Windows Media Player 9' UNION ALL
SELECT 'DCZC445645S','Microsoft Office 2003 Excel' UNION ALL
SELECT 'DCZC843531X','1E NomadBranch 3' UNION ALL
SELECT 'DCZC843531X','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC843531X','Windows Media Player 9' UNION ALL
SELECT 'DCZC843531X','Microsoft Office 2003 Excel' UNION ALL
SELECT 'DCZC2231124','1E NomadBranch 3' UNION ALL
SELECT 'DCZC2231124','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC2231124','Windows Media Player 9' UNION ALL
SELECT 'DCZC2231124','Microsoft Office 2003 Excel' UNION ALL
SELECT 'DCZC84352Z9','!NovaDigm Deployment Repository' UNION ALL
SELECT 'DCZC84352Z9','1E NomadBranch 3' UNION ALL
SELECT 'DCZC84352Z9','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC84352Z9','Adobe Reader 9' UNION ALL
SELECT 'DCZC84352Z9','Citrix Program Neighborhood Agent 10' UNION ALL
SELECT 'DCZC60548AS','!NovaDigm Deployment Repository' UNION ALL
SELECT 'DCZC60548AS','1E NomadBranch 3' UNION ALL
SELECT 'DCZC60548AS','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC60548AS','Adobe Reader 9' UNION ALL
SELECT 'DCZC60548AS','Citrix Program Neighborhood Agent 10' UNION ALL
SELECT 'DCZC789784C','Microsoft Internet Explorer 8' UNION ALL
SELECT 'DCZC789784C','Windows Media Player 9' UNION ALL
SELECT 'DCZC789784C','Microsoft Office 2003 Excel'
The query I have so far is...
SELECT A.ComputerName Comp1, B.ComputerName Comp2
FROM #MyTable A
JOIN (SELECT ComputerName, count(ApplicationName) CountBs
FROM #MyTable
GROUP BY ComputerName) G ON G.ComputerName = A.ComputerName
JOIN #MyTable B on A.ApplicationName = B.ApplicationName and A.ComputerName < B.ComputerName
GROUP BY A.ComputerName, B.ComputerName, G.CountBs HAVING COUNT(DISTINCT A.ApplicationName) = G.CountBs
This gives me the following results...
Comp1Comp2
DCZC2231124DCZC445645S
DCZC2231124DCZC843531X
DCZC445645SDCZC843531X
DCZC60548ASDCZC84352Z9
But because there are hundreds of computer names this is going to make interpreting the above results a nightmare so I'm looking for something along the lines of....
DCZC2231124DCZC445645S DCZC843531X
DCZC60548ASDCZC84352Z9
Craig - thanks for the tips on forum etiquette, I hope this is more in line with what you were looking for!
Thanks in advance again!
September 27, 2012 at 4:02 pm
gary.harrington (9/27/2012)
My apologies - my original post was using over simplified data. My real data has a list of computer names in column A and names of installed applications in column B. I want to find computers that have the same software stack installed. The actual data is very long so here is a cut down version
Excellent, this is perfect. A few additional items. I realize we're not looking at the origination schema here, however, does the source have an identity column that can be used against the compname/appname, and what indexes apply to these two (three) columns in the base table? It'll affect your approach, especially with very large data sets.
Craig - thanks for the tips on forum etiquette, I hope this is more in line with what you were looking for!
Thanks in advance again!
Yep, nearly on target but because you're going to need something optimal I need just a little more, otherwise the code may run like a three legged dog. You're new though, and are willing to do the work once it's pointed out, so you're good to go there. ๐
Now that I have a general idea of what you're dealing with however, here's the general answer to Luis' question earlier (don't use these out of the box, though...):
DECLARE @TestTABLE(
colAvarchar(15),
colBvarchar(15))
INSERT @Test VALUES
('John', 'Apple'),
('John','Orange'),
('John','Banana'),
('Mary','Orange'),
('Mary','Strawberry'),
('Luis','Orange'),
('Luis','Strawberry'),
('Luis','Apple'),
('Mike','Orange'),
('Mike','Strawberry'),
('Mike','Apple'),
('David','Apple'),
('David','Orange'),
('David','Banana');
-- Row Collapse using FOR XML
;WITH cte AS
(SELECT DISTINCT
colA,
(SELECT ColB + ',' AS 'data()'
FROM @Test AS t2
WHERE t2.ColA = t.ColA FOR XML PATH( '')) AS OneRowVal
FROm
@Test AS t
)
SELECT
t.ColA AS MatchedName1,
t2.ColA AS MatchedName2
FROM
cte AS t
JOIN
cte AS t2
ONt2.OneRowVal = t.OneRowVal
AND t2.ColA <> t.ColA
-- PreCount and Join Check
IF OBJECT_ID('tempdb..#Master') IS NOT NULL
DROP TABLE #Master
CREATE TABLE #Master
( ColA VARCHAR(15),
cnt INT)
CREATE CLUSTERED INDEX idx_master ON #Master
(cnt)
INSERT INTO #Master
SELECT
ColA,
COUNT(*) AS cnt
FROM
@Test
GROUP BY
ColA
--SELECT * FROM #Master
-- This makes sure we don't deal with items with 'extras'.
;WITH CoreCompareList AS
(SELECT
m.ColA AS Name1,
m2.ColA AS Name2,
m.cnt AS cnt
FROM
#Master AS m
JOIN
#Master AS m2
ONm.cnt = m2.cnt
AND m.ColA <> m2.ColA
)
, CountList AS
(SELECT
ccl.Name1,
ccl.Name2,
COUNT(*) AS cnt
FROM
CoreCompareList AS ccl
JOIN
@Test AS t
ONccl.Name1 = t.ColA
JOIN
@Test AS t2
ONccl.Name2 = t2.ColA
AND t.ColB = t2.ColB
GROUP BY
ccl.Name1,
ccl.Name2
)
SELECT
ccl.Name1,
ccl.Name2
FROM
CoreCompareList AS ccl
JOIN
CountList AS cl
ONccl.Name1 = cl.Name1
AND ccl.Name2 = cl.Name2
AND ccl.cnt = cl.cnt
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2012 at 4:15 pm
I take it back. Because of the length and datatype of your application information, there's a completely different approach I'd personally take here. Don't worry about the extra information, but can you confirm there's an index in the base table (non-clustered by preference) that uses ComputerName on the first column and includes (somewhere) ApplicationName?
I'm building out the rest of it now. The chained combination pivot is where it gets messy.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2012 at 4:24 pm
Some notes in the code, others below.
IF OBJECT_ID('TempDB..#mytable','U') IS NOT NULL
DROP TABLE #mytable
CREATE TABLE #MyTable(
[ComputerName] [nvarchar](255) NULL,
[ApplicationName] [nvarchar](255) NULL
)
INSERT INTO #MyTable
(ComputerName, ApplicationName)
SELECT 'DCZC445645S','1E NomadBranch 3' UNION ALL
SELECT 'DCZC445645S','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC445645S','Windows Media Player 9' UNION ALL
SELECT 'DCZC445645S','Microsoft Office 2003 Excel' UNION ALL
SELECT 'DCZC843531X','1E NomadBranch 3' UNION ALL
SELECT 'DCZC843531X','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC843531X','Windows Media Player 9' UNION ALL
SELECT 'DCZC843531X','Microsoft Office 2003 Excel' UNION ALL
SELECT 'DCZC2231124','1E NomadBranch 3' UNION ALL
SELECT 'DCZC2231124','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC2231124','Windows Media Player 9' UNION ALL
SELECT 'DCZC2231124','Microsoft Office 2003 Excel' UNION ALL
SELECT 'DCZC84352Z9','!NovaDigm Deployment Repository' UNION ALL
SELECT 'DCZC84352Z9','1E NomadBranch 3' UNION ALL
SELECT 'DCZC84352Z9','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC84352Z9','Adobe Reader 9' UNION ALL
SELECT 'DCZC84352Z9','Citrix Program Neighborhood Agent 10' UNION ALL
SELECT 'DCZC60548AS','!NovaDigm Deployment Repository' UNION ALL
SELECT 'DCZC60548AS','1E NomadBranch 3' UNION ALL
SELECT 'DCZC60548AS','1E WakeUp Agent 6' UNION ALL
SELECT 'DCZC60548AS','Adobe Reader 9' UNION ALL
SELECT 'DCZC60548AS','Citrix Program Neighborhood Agent 10' UNION ALL
SELECT 'DCZC789784C','Microsoft Internet Explorer 8' UNION ALL
SELECT 'DCZC789784C','Windows Media Player 9' UNION ALL
SELECT 'DCZC789784C','Microsoft Office 2003 Excel'
IF OBJECT_ID('tempdb..#Checksums') IS NOT NULL
DROP TABLE #Checksums
CREATE TABLE #Checksums
( csID INT IDENTITY( 1,1) NOT NULL,
ComputerName nVARCHAR(255) NOT NULL,
ChecksumValue INT)
CREATE CLUSTERED INDEX idx_Checksums ON #Checksums
( ChecksumValue)
;WITH cte AS
(SELECT DISTINCT
ComputerName,
(SELECT ApplicationName + ',' AS 'data()'
FROM #MyTable AS t2
WHERE t2.ComputerName = t.ComputerName FOR XML PATH( '')) AS OneRowVal
FROm
#MyTable AS t
)
INSERT INTO #Checksums
SELECT
ComputerName,
CHECKSUM( OneRowVal)
FROM
cte
SELECT
c.ChecksumValue,
c.ComputerName AS Name1,
c2.ComputerName AS Name2
FROM
#Checksums AS c
JOIN
#Checksums AS c2
ONc.ChecksumValue = c2.ChecksumValue
AND c.csID < c2.csID -- Controls 'back and forth' duplication
Gary, this should get you the existing results you have as quickly as possible as long as you have the above mentioned index available.
The problem is de-duping the existing values in the result set. The only way I have to do it offhand is a bit messy but it DOES get you there.
;WITH CompareComplete AS
(SELECT
c.ChecksumValue,
c.ComputerName AS Name1,
c2.ComputerName AS Name2
FROM
#Checksums AS c
JOIN
#Checksums AS c2
ONc.ChecksumValue = c2.ChecksumValue
AND c.csID < c2.csID -- Controls 'back and forth' duplication
)
, UnionedResults AS
(SELECT
ChecksumValue,
Name1 AS CompName
FROM
CompareComplete
UNION -- This will force a list distinction, do NOT use ALL
SELECT
ChecksumValue,
Name2
FROM
CompareComplete
)
SELECT DISTINCT
ChecksumValue,
(SELECT CompName + ',' AS 'data()'
FROM UnionedResults AS t2
WHERE t2.ChecksumValue = t.ChecksumValue FOR XML PATH( '')) AS OneRowVal
FROm
UnionedResults AS t
I'm hoping someone can clean up the formatting of the results a bit. This process is one I've ran against larger comparison sets (VARCHAR(4000)s) which absolutely killed me in the compare performance, thus the use of CHECKSUM to remove the possible 1000+ character comparisons. The use of the Temp table also gets you off the primary data sources.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
September 27, 2012 at 7:39 pm
Here's another, relatively straightforward approach:
;WITH ApplicationsGrouped AS (
SELECT ComputerName
,Applications=STUFF((
SELECT ';' + ApplicationName
FROM #mytable b
WHERE a.ComputerName = b.ComputerName
ORDER BY ApplicationName
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')
,1, 1, '')
FROM #mytable a
GROUP BY ComputerName)
SELECT Applications
,Computers=STUFF((
SELECT ';' + ComputerName
FROM ApplicationsGrouped b
WHERE a.Applications = b.Applications
ORDER BY ComputerName
FOR XML PATH(''), TYPE).value('.', 'VARCHAR(MAX)')
,1, 1, '')
FROM ApplicationsGrouped a
GROUP BY Applications
HAVING COUNT(ComputerName) > 1
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
September 28, 2012 at 1:20 am
Craig - This is perfect! Much appreciated!.... and yes, I have a non-clustered index on the table using ComputerName and ApplicationName
Thanks a million!
G
September 28, 2012 at 1:23 am
Nice Dwain! Thanks for taking the time to look at this for me!
September 28, 2012 at 2:31 am
Here's a fourth method:
-- solution
;WITH ParentsOnly AS (
SELECT ComputerName, ChildRows = COUNT(*)
FROM #MyTable
GROUP BY ComputerName
)
SELECT a.*, b.*, x.*
FROM ParentsOnly a
INNER JOIN ParentsOnly b ON b.ChildRows = a.ChildRows AND b.ComputerName > a.ComputerName
CROSS APPLY ( -- match up the child rows
SELECT MatchingRows = COUNT(*)
FROM #MyTable ai
INNER JOIN #MyTable bi ON bi.ComputerName = b.ComputerName
AND bi.ApplicationName = ai.ApplicationName
WHERE ai.ComputerName = a.ComputerName
) x
WHERE a.ChildRows = x.MatchingRows
ORDER BY a.ComputerName, b.ComputerName
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 28, 2012 at 2:58 am
Another one
WITH GrpCounts(ComputerName, ApplicationName,GrpCnt) AS (
SELECT ComputerName, ApplicationName,
COUNT(*) OVER(PARTITION BY ComputerName)
FROM #MyTable),
Results(ComputerNameA,ComputerNameB) AS (
SELECT x.ComputerName , y.ComputerName
FROM GrpCounts x
INNER JOIN GrpCounts y ON y.ApplicationName = x.ApplicationName AND y.ComputerName < x.ComputerName AND x.GrpCnt=y.GrpCnt
GROUP BY x.ComputerName, y.ComputerName, x.GrpCnt
HAVING COUNT(*) = x.GrpCnt)
SELECT r1.ComputerNameA +
(SELECT ' ' + r3.ComputerNameB AS "text()"
FROM Results r3
WHERE r3.ComputerNameA = r1.ComputerNameA
ORDER BY r3.ComputerNameB
FOR XML PATH('')) Computers
FROM Results r1
WHERE NOT EXISTS(SELECT * FROM Results r2 WHERE r1.ComputerNameA = r2.ComputerNameB)
GROUP BY r1.ComputerNameA
ORDER BY r1.ComputerNameA;
____________________________________________________
Deja View - The strange feeling that somewhere, sometime you've optimised this query before
How to get the best help on a forum
http://www.sqlservercentral.com/articles/Best+Practices/61537Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply