﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2008 / SQL Server Newbies  / Help with Counts / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Wed, 22 May 2013 16:07:44 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>Hello. :-)Okay, given the use-case data, and that definition for datakey, I think the query I wrote with the Union All statements will accomplish what the business needs.  Check with your salespeople, see if it does work for them business-wise.  It will give a count of unique households and unique landlines to those households.  Unless there are legal or other regulatory limitations on how it has to be counted, this should do what's needed.Which raises the question, is it actually any faster than the current query on the real data?  If not, it's not actually improving anything for you.</description><pubDate>Fri, 20 Jan 2012 06:14:31 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]GSquared (1/19/2012)[/b][hr]I see what the original method is doing.  The way it goes about achieving it isn't particularly sensible, but it does have a functional pattern to it.Is the query run repeatedly, or just once?On the numbers that the salespeople are citing, does a datakey correspond to a household, or to a location?  I.e.: We have X households in Y city, does datakey correspond to X or Y?[/quote]Hi Gus ;) Datakey is a unique household identifier, so it would correspond to XThanks</description><pubDate>Thu, 19 Jan 2012 06:34:25 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>I see what the original method is doing.  The way it goes about achieving it isn't particularly sensible, but it does have a functional pattern to it.Is the query run repeatedly, or just once?On the numbers that the salespeople are citing, does a datakey correspond to a household, or to a location?  I.e.: We have X households in Y city, does datakey correspond to X or Y?</description><pubDate>Thu, 19 Jan 2012 06:31:14 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/19/2012)[/b][hr]Who's Gus?[/quote]Me</description><pubDate>Thu, 19 Jan 2012 06:15:03 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>Who's Gus?</description><pubDate>Thu, 19 Jan 2012 03:44:03 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>All yours Gus.</description><pubDate>Wed, 18 Jan 2012 08:46:36 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]ChrisM@home (1/18/2012)[/b][hr][quote][b]bicky1980 (1/11/2012)[/b][hr]...These are just the basic counts, I will need to then run these incoporating the where clause for various variables.  If someone could advise of the best way to run these, then at least thats a start.  The way I am currently processing these is very long winded and I am sure there are much easier &amp; better ways to run them...[/quote][quote][b]bicky1980 (1/17/2012)[/b][hr]Just to try and simplify my request, this is how I have been told to preform these counts in the past: (Indkey is a unique key)[code="plain"]select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''Select max(indkey) as indkey,  max(datakey) as datakeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])Select max(indkey) as indkey,  max(landline) as landlineinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])[/code]I know this is certainly not the best way to do these and is very slow hence why I am trying to get a faster, better solution.[/quote]The "original query" above can be deconstructed as follows:[code="sql"]-- query 1select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''-- query 2Select max(indkey) as indkeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakey-- yields unique values of datakey with highest value of indkey per datakey-- query 3delete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])-- leaves only indkey/datakey values from query 2-- query 4Select max(indkey) as indkeyinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakey-- since only indkey/datakey values from query 2 remain in the table, and datakey is unique, -- this returns exactly the same as query 2-- query 5delete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])-- does nothing, because it's already been done by query 3[/code] In answer to your opening post on this thread - yes, this can be done much more quickly, as follows (there are numerous ways of doing this):[code="sql"]SELECT somevalue = COUNT(*)FROM [tablename1] tINNER JOIN (	Select max(indkey) as indkey	from [tablename1]	where landline is not null and landline!=''	group by datakey) d ON d.indkey = t.indkey[/code]Looks to me like this is just the landline count. Can you confirm?[/quote]Yes this would cover just the landline count, so using the same rules as landlines would need figures for emails &amp; mobiles as well.</description><pubDate>Wed, 18 Jan 2012 08:44:51 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/18/2012)[/b][hr][quote][b]bicky1980 (1/18/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr]I think that we can all agree that this was not a mathematical issue but a logic issue. The two not being mutually exclusive, but being related.  Anyway... I still have yet to know how this data is being interpreted and what the meaning of it is.  So far, I cannot draw any conclusions or predictions based on that number with the way that it is being obtained.  I of course operate under the assumption that we gather data statistics to make predictions or to come up with some conclusion.  When the requests come from poorly educated decision makers (too many in the c-class), they make poor decisions because they don't understand what the numbers they are asking for really represent.I consider it my duty when in a BA role to question the meaning of the data I am retrieving so that I can suggest an alternative if one is needed. Data like this should be answering a specific question.  I really want to know what that question is before I pull data trying to answer it. Also don't wat my CEO to look like an idiot when he takes his data to the Board of Directors and tries to explain it incorrectly.[/quote]Do you require more info from me?[/quote]I think the others are well on their way to helping you with a single query to do this, but I have my own curiosities lying in what this data is supposed to tell you. So, if you know that, I would love to know what someone is interpreting this number as. Because you cannot say it is the total unique landlines, you cannot say it is the total number of unique datakey-landline pairs.  So what does it describe? I understand if you do not know and you just want to get it done for your boss or whomever is asking for it.  I am just inquisitive on this because of my nature and my love of interpreting statistics.[/quote]The statistics are being used to provide the sale team with marketing counts to use to assist them in sales.e.g. We have 10,125 people in Aberdeen with unique Landlines and @ one per Household</description><pubDate>Wed, 18 Jan 2012 08:40:51 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/11/2012)[/b][hr]...These are just the basic counts, I will need to then run these incoporating the where clause for various variables.  If someone could advise of the best way to run these, then at least thats a start.  The way I am currently processing these is very long winded and I am sure there are much easier &amp; better ways to run them...[/quote][quote][b]bicky1980 (1/17/2012)[/b][hr]Just to try and simplify my request, this is how I have been told to preform these counts in the past: (Indkey is a unique key)[code="plain"]select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''Select max(indkey) as indkey,  max(datakey) as datakeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])Select max(indkey) as indkey,  max(landline) as landlineinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])[/code]I know this is certainly not the best way to do these and is very slow hence why I am trying to get a faster, better solution.[/quote]The "original query" above can be deconstructed as follows:[code="sql"]-- query 1select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''-- query 2Select max(indkey) as indkeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakey-- yields unique values of datakey with highest value of indkey per datakey-- query 3delete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])-- leaves only indkey/datakey values from query 2-- query 4Select max(indkey) as indkeyinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakey-- since only indkey/datakey values from query 2 remain in the table, and datakey is unique, -- this returns exactly the same as query 2-- query 5delete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])-- does nothing, because it's already been done by query 3[/code] In answer to your opening post on this thread - yes, this can be done much more quickly, as follows (there are numerous ways of doing this):[code="sql"]SELECT somevalue = COUNT(*)FROM [tablename1] tINNER JOIN (	Select max(indkey) as indkey	from [tablename1]	where landline is not null and landline!=''	group by datakey) d ON d.indkey = t.indkey[/code]Looks to me like this is just the landline count. Can you confirm?</description><pubDate>Wed, 18 Jan 2012 08:33:50 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/18/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr]I think that we can all agree that this was not a mathematical issue but a logic issue. The two not being mutually exclusive, but being related.  Anyway... I still have yet to know how this data is being interpreted and what the meaning of it is.  So far, I cannot draw any conclusions or predictions based on that number with the way that it is being obtained.  I of course operate under the assumption that we gather data statistics to make predictions or to come up with some conclusion.  When the requests come from poorly educated decision makers (too many in the c-class), they make poor decisions because they don't understand what the numbers they are asking for really represent.I consider it my duty when in a BA role to question the meaning of the data I am retrieving so that I can suggest an alternative if one is needed. Data like this should be answering a specific question.  I really want to know what that question is before I pull data trying to answer it. Also don't wat my CEO to look like an idiot when he takes his data to the Board of Directors and tries to explain it incorrectly.[/quote]Do you require more info from me?[/quote]I think the others are well on their way to helping you with a single query to do this, but I have my own curiosities lying in what this data is supposed to tell you. So, if you know that, I would love to know what someone is interpreting this number as. Because you cannot say it is the total unique landlines, you cannot say it is the total number of unique datakey-landline pairs.  So what does it describe? I understand if you do not know and you just want to get it done for your boss or whomever is asking for it.  I am just inquisitive on this because of my nature and my love of interpreting statistics.</description><pubDate>Wed, 18 Jan 2012 08:09:59 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr]I think that we can all agree that this was not a mathematical issue but a logic issue. The two not being mutually exclusive, but being related.  Anyway... I still have yet to know how this data is being interpreted and what the meaning of it is.  So far, I cannot draw any conclusions or predictions based on that number with the way that it is being obtained.  I of course operate under the assumption that we gather data statistics to make predictions or to come up with some conclusion.  When the requests come from poorly educated decision makers (too many in the c-class), they make poor decisions because they don't understand what the numbers they are asking for really represent.I consider it my duty when in a BA role to question the meaning of the data I am retrieving so that I can suggest an alternative if one is needed. Data like this should be answering a specific question.  I really want to know what that question is before I pull data trying to answer it. Also don't wat my CEO to look like an idiot when he takes his data to the Board of Directors and tries to explain it incorrectly.[/quote]Do you require more info from me?</description><pubDate>Wed, 18 Jan 2012 08:01:34 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>I think that we can all agree that this was not a mathematical issue but a logic issue. The two not being mutually exclusive, but being related.  Anyway... I still have yet to know how this data is being interpreted and what the meaning of it is.  So far, I cannot draw any conclusions or predictions based on that number with the way that it is being obtained.  I of course operate under the assumption that we gather data statistics to make predictions or to come up with some conclusion.  When the requests come from poorly educated decision makers (too many in the c-class), they make poor decisions because they don't understand what the numbers they are asking for really represent.I consider it my duty when in a BA role to question the meaning of the data I am retrieving so that I can suggest an alternative if one is needed. Data like this should be answering a specific question.  I really want to know what that question is before I pull data trying to answer it. Also don't wat my CEO to look like an idiot when he takes his data to the Board of Directors and tries to explain it incorrectly.</description><pubDate>Tue, 17 Jan 2012 17:03:27 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]ChrisM@home (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b][hr]Here's another attempt at making some progress towards a precise statement or the requirement....[/quote]Tom, how does the query I posted [url=http://www.sqlservercentral.com/Forums/FindPost1237065.aspx][i]here[/i][/url] fit in with all this? Be nice, I majored in biochemistry/microbiology, not maths.BTW Bicky's not necessarily a "he" - could be Rebecca from Australia ;-)[/quote]Can I use the classic excuse that when in doubt one uses the masculine, which is supposed to embrace the feminine?  :blush:  Sometime I write "[s]he" or something like that, but mostly I can't be bothered (for good and sensible reasons that some feminists agree with while others find them offensive).Well, on the query I think it's a bit optimistic to think you will find enough rows where both rndk and rnll are 1 to get a complete result (same error as Jared's - too optimistic about reductions continuing to be available at every step); but counting every such row as a first step, and repeating that with what would be left after eliminating what the selection of those rows requires you to eliminate is certainly a reasonable way to begin, you just need to add code to cope with the case where this fails to choose more rows but uneliminated rows still exist. So if you put it in an iteration where for example if there are no rows with both rndk and rnll are 1 you instead select rows where rndk = 1 and rnll = min(rnll where rndk=1) (or swap the roles or rnll and rndk in that description, or do something else that is certain to pick at least one row) you end up with something that works (produces a result which conforms to what in one of my posts I called option P2 provided you iterate it - or recurse it - until everything is eliminated) so you definitely have the basis for a good solution.Of course whether it is the right solution depends of what the requirement is, and the only way we have to work that out is to understand the code that Bicky posted as being the working but poorly performing version that is to be replaced.  I haven't yet found time to work out just what that is, although it looks simple enough.On the biochem and microbiology vs maths front, I think that as a basis for computing either is as good as the other; I have a particularly strong disrespect for CS and IT degrees from most universities, but believe that a degree in [i]almost[/i] any other subject is an excellent preparation for a career in computing (or IT or DBAing or whatever anyone wants to call it).  I tend to use math-based logic with concepts like sets and maxima and minima quite often to describe requirements and potential solutions, because I've found that salesmen, accountants, and general managers actually understand stuff presented like that a lot better better than they understand stuff presented in convoluted sentences designed to avoid "difficult" concepts like sets and maxima and minima.</description><pubDate>Tue, 17 Jan 2012 16:25:36 GMT</pubDate><dc:creator>L' Eomot Inversé</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>With a tiny mod, this produces the same results as Bicky's original query:[code="sql"];WITH Level1 AS (	SELECT indkey, datakey, landline	FROM (		SELECT indkey, datakey, landline, 			rnll = ROW_NUMBER() OVER(PARTITION BY landline ORDER BY indkey DESC), 			rndk = ROW_NUMBER() OVER(PARTITION BY datakey ORDER BY indkey DESC) 		FROM #test2		WHERE landline &amp;lt;&amp;gt; ''	) d	WHERE rnll = 1 AND rndk = 1),Level2 AS (	SELECT indkey, datakey, landline	FROM (		SELECT			indkey, datakey, landline, 			rnll = ROW_NUMBER() OVER(PARTITION BY landline ORDER BY indkey DESC), 			rndk = ROW_NUMBER() OVER(PARTITION BY datakey ORDER BY indkey DESC) 		FROM #test2 t		WHERE landline &amp;lt;&amp;gt; ''		AND NOT EXISTS (SELECT 1 FROM Level1 l WHERE l.landline = t.landline OR l.datakey = t.datakey)	) d 	WHERE rnll = 1 AND rndk = 1)SELECT indkey, datakey, landline FROM Level1 UNION ALLSELECT indkey, datakey, landline FROM Level2ORDER BY datakey, landline[/code]... so the original query is not so daft as it looks.Edit: sheesh, this is doing my head in. Bicky's original query is in fact functionally equivalent to this:Select max(indkey) as indkeyfrom #landlines (#temp filtered for landlines)group by datakey-and works on the small datasets purely by accident. It [i]will [/i]have dupes on landline in larger data sets. Here's a dataset from which it will return dupes on landline:[code="sql"]DROP TABLE #test2CREATE TABLE #test2    (indkey NVARCHAR(2),     datakey NVARCHAR(4),     landline NVARCHAR(11),     mobile NVARCHAR(11),     email NVARCHAR(20))INSERT INTO #test2 VALUES  ('01','0001','01234567890','0712345679','1@test.co.uk') -- YES,('02','0001','01234567890','','1@test.co.uk') -- x NO, dupe on indkey = 1 (datakey &amp; landline),('03','0002','01234567890','','2@test.co.uk') -- NO, dupe on indkey = 1 (landline),('04','0002','01234567891','','2@test.co.uk') -- x YES,('05','0002','','07123456789','')				-- NO, dupe on indkey = 4 (datakey),('06','0003','01234567892','07123456791','') -- x YES,('07','0004','01234567893','07123456792','') -- x YES,('08','0005','01234567894','07123456793','2@test.co.uk') -- YES,('09','0005','01234567898','07123456793','2@test.co.uk') -- x NO, dupe on indkey = 8 (datakey),('10','0008','01234567894','07123456793','9@test.co.uk') -- YES,('11','0008','01234567895','','5@test.co.uk') -- NO, dupe on indkey = 10 (datakey),('12','0008','01234567898','','6@test.co.uk') -- x NO, dupe on indkey = 10 (datakey),('13','0009','01234567896','','6@test.co.uk') -- NO, dupe on indkey = 10 (landline),('14','0009','01234567889','','6@test.co.uk') -- x YES,('15','0010','01189567889','','6@test.co.uk') -- YES,('16','0010','01189567890','','6@test.co.uk') -- x NO, dupe on indkey = 15 (datakey),('17','0001','03189567889','','6@test.co.uk') -- NO, dupe on indkey = 1 (datakey) ,('18','0002','03189567890','','6@test.co.uk') -- NO, dupe on indkey = 4 (datakey) ,('19','0003','03189567891','','6@test.co.uk') -- NO, dupe on indkey = 6 (datakey) [/code] Returns indkeys 9 and 12.</description><pubDate>Tue, 17 Jan 2012 15:22:35 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote]If the data could be processed in the following manner that would be great1. Extract all the data with a landline2. Dedupe the data on datakey - keep only one record (e.g. 3 duplicate datakeys would become just one datakey)3. Dedupe the data on landline - same rule as above[/quote]This misses which record to keep in step 2. That is the big issue here... Which record (if any) get retained for comparison on landline from datakey? Clearly, that matters.  I believe it is based on maximizing the count; i.e. if a data key only has 1 unique landline, do not count that landline in the determination of other datakey counts. Right?EDIT: If the above is true, then a count of distinct landlines where landline meets the acceptable criteria for length is the same thing.[/quote]Ok, my edit was wrong because in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?[/quote]Your question, "in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?" - The answer is yes.Just to try and simplify my request, this is how I have been told to preform these counts in the past: (Indkey is a unique key)[code="plain"]select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''Select max(indkey) as indkey,  max(datakey) as datakeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])Select max(indkey) as indkey,  max(landline) as landlineinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])[/code]I know this is certainly not the best way to do these and is very slow hence why I am trying to get a faster, better solution.[/quote]Unlikely though it may seem at first glance, this does generate correct results from the extended sample data set. Here's a functionally equivalent version which uses CTE's instead of "temp" tables which will probably work twice as fast (because the "temp tables" weren't indexed):[code="sql"];WITH -- datakeys distincted by max indkey FirstInclusion AS (	Select max(indkey) as indkey --, 		--datakey -- MAX not required	from #test2	group by datakey),-- pick landline with highest indkey from rows with same datakeySecondInclusion AS (	Select max(indkey) as indkey --, 		--max(landline) as landline	from #test2	where indkey in (select indkey from FirstInclusion)	group by datakey)SELECT *FROM #test2WHERE indkey IN (select indkey from FirstInclusion)	OR indkey IN (select indkey from SecondInclusion)ORDER BY indkey[/code]The results are quite similar to this:[code="sql"];WITH Level1 AS (	SELECT ind, indkey, datakey, landline	FROM (		SELECT ind = 1, indkey, datakey, landline, 			rnll = ROW_NUMBER() OVER(PARTITION BY landline ORDER BY datakey DESC), 			rndk = ROW_NUMBER() OVER(PARTITION BY datakey ORDER BY landline DESC) 		FROM #test2		WHERE landline &amp;lt;&amp;gt; ''	) d	WHERE rnll = 1 AND rndk = 1),Level2 AS (	SELECT ind, indkey, datakey, landline	FROM (		SELECT			ind = 2, indkey, datakey, landline, 			rnll = ROW_NUMBER() OVER(PARTITION BY landline ORDER BY datakey DESC), 			rndk = ROW_NUMBER() OVER(PARTITION BY datakey ORDER BY landline DESC) 		FROM #test2 t		WHERE landline &amp;lt;&amp;gt; ''		AND NOT EXISTS (SELECT 1 FROM Level1 l WHERE l.landline = t.landline OR l.datakey = t.datakey)	) d 	WHERE rnll = 1 AND rndk = 1)SELECT ind, indkey, datakey, landlineFROM (SELECT ind, indkey, datakey, landline FROM Level1 UNION ALLSELECT ind, indkey, datakey, landline FROM Level2) dORDER BY indkey --datakey, landline[/code]- the number of rows returned is the same, but there are some differences between sacrificed rows.</description><pubDate>Tue, 17 Jan 2012 15:07:52 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]L' Eomot Inversé (1/17/2012)[/b][hr][quote][b]GSquared (1/17/2012)[/b][hr]I think this whole thing may be getting massively overcomplicated by the math.Is the actual business requirement simply a count of how many distinct datakeys there are, and a count of how many distinct landline-datakey pairs there are, or did some manager/executive somewhere actually request something this mathematically complicated?What is the actual statement of the exact business need?  As written/stated by the person who will actually be using this data, not summarized or anything like that.[/quote]I'm sure you are right, that the way it's been described to us is vastly overcomplicated.  Now that Bicky has posted the code he's trying to replace we can maybe work out what it actually does and take that as a definite requirement - because the motivation for replacing it appears to be to get it to go faster, not to change the functionality.On "by the math" I disagree with you: math is a way of providing clear and unequivocal specifications of requirements; one for of math is to present an algorithm (which Bicky has now done, and which Jared made an interesting attempt at, no doubt provoking that response from Bicky).  What has complicated this has been the utter lack of clarity in the requirement (which Bicky's most recent post has probably fixed).[/quote]"Math" as a language can be very clear, and extremely precise.  I think that's not the case here, because the math is being used to translate from unclear and incomplete English.I'm not saying math is the problem, I'm saying all the math here is part of the problem, because it's premature, without a clear business-rule for the math to turn into something an engineer can implement.It's like getting into stress-tollerance and hardness numbers for various alloys of steel which can be used in construction, when all we know is that a bridge is needed.  If it's over a 1m wide creek that gets occassional foot traffic, mostly by kids, steel alloy analysis is overkill.  If it's going to allow heavy freight to cross the Straights of Gibraltar, steel isn't going to come even close to being strong enough, no matter what alloy.  We're still in the "someone needs a bridge" stage, without the details of what it's crossing and what it's intended to carry.So we talk math, and shotgun various approaches, and hope it looks like what's needed on a small sample of data similar to the real data, but not the real data.  We should be talking "over what and carrying what", not "how strong an alloy of steel".</description><pubDate>Tue, 17 Jan 2012 13:52:14 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]L' Eomot Inversé (1/17/2012)[/b][hr][quote][b]GSquared (1/17/2012)[/b][hr]I think this whole thing may be getting massively overcomplicated by the math.Is the actual business requirement simply a count of how many distinct datakeys there are, and a count of how many distinct landline-datakey pairs there are, or did some manager/executive somewhere actually request something this mathematically complicated?What is the actual statement of the exact business need?  As written/stated by the person who will actually be using this data, not summarized or anything like that.[/quote]I'm sure you are right, that the way it's been described to us is vastly overcomplicated.  Now that Bicky has posted the code he's trying to replace we can maybe work out what it actually does and take that as a definite requirement - because the motivation for replacing it appears to be to get it to go faster, not to change the functionality.On "by the math" I disagree with you: math is a way of providing clear and unequivocal specifications of requirements; one for of math is to present an algorithm (which Bicky has now done, and which Jared made an interesting attempt at, no doubt provoking that response from Bicky).  What has complicated this has been the utter lack of clarity in the requirement (which Bicky's most recent post has probably fixed).[/quote]I did try explaining, to me, my initial request still makes sense ;) btw I am male</description><pubDate>Tue, 17 Jan 2012 13:13:39 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]GSquared (1/17/2012)[/b][hr]I think this whole thing may be getting massively overcomplicated by the math.Is the actual business requirement simply a count of how many distinct datakeys there are, and a count of how many distinct landline-datakey pairs there are, or did some manager/executive somewhere actually request something this mathematically complicated?What is the actual statement of the exact business need?  As written/stated by the person who will actually be using this data, not summarized or anything like that.[/quote]I'm sure you are right, that the way it's been described to us is vastly overcomplicated.  Now that Bicky has posted the code he's trying to replace we can maybe work out what it actually does and take that as a definite requirement - because the motivation for replacing it appears to be to get it to go faster, not to change the functionality.On "by the math" I disagree with you: math is a way of providing clear and unequivocal specifications of requirements; one for of math is to present an algorithm (which Bicky has now done, and which Jared made an interesting attempt at, no doubt provoking that response from Bicky).  What has complicated this has been the utter lack of clarity in the requirement (which Bicky's most recent post has probably fixed).</description><pubDate>Tue, 17 Jan 2012 13:09:57 GMT</pubDate><dc:creator>L' Eomot Inversé</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]L' Eomot Inversé (1/17/2012)[/b][hr]Here's another attempt at making some progress towards a precise statement or the requirement....[/quote]Tom, how does the query I posted [url=http://www.sqlservercentral.com/Forums/FindPost1237065.aspx][i]here[/i][/url] fit in with all this? Be nice, I majored in biochemistry/microbiology, not maths.BTW Bicky's not necessarily a "he" - could be Rebecca from Australia ;-)</description><pubDate>Tue, 17 Jan 2012 13:06:09 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr]I think we really need the rules. They have not been clearly defined. Let me see if I can give what I understand (for landlines) and if it is correct, great.  If not, please try to adjust using my framework, this is not necessarily the best approach in SQL, but the easiest way to understand it:1. Insert ALL data into a staging table2. Find all datakeys that only have 1 unique landline in the staging table and insert the datakey and landline into a temp table3. delete any duplicate landlines from the temp table4. join the staging table on the temp table and delete any landlines from it that exist in the temp table including the data that moved to the temp table5. Insert the data from the temp table BACK into the staging table.6. Repeat the process until no duplicate landlinesAlthough this may not be the best procedural solution in SQL... Does it fundamentally describe the business rules? I couldn't come up with a better stepwise way to illustrate my understanding of the rules :)EDIT: So we are systematically eliminating duplicate landlines ensuring that datakeys that only have 1 unique landline get precedence for the count over datakeys that have more than 1 unique landline.[/quote]If that would work it would be a great solution.  Unfortunately it's very easy to find data on which it won't work - for example if your initial data has only datakeys that have two or more landlines each, you never eliminate any rows; and even if it doesn't break before it starts each time round the loop risks delivering, at the end of step 4 before the temp table rows have been added back, a staging table which at that stage has no datakeys with one landline, so after than you never elimate another row.  But although it doesn't work, it's an excellent step on the way to a solution.You can improve the algorithm's chances by having two different kinds of step: the one that looks for datakeys with only one landline, and another that looks for landlines with only one datakey and deals with them, and alternating those steps.  This actually a good way to start fixing the non-termination problem, because although it doesn't fix it it does ensure that in general you hit the "there are no singletons so I must do something else" situation less often than if you don't do this, and the code for handling that situation is going to be messy.If you adapt it so that it operates (when there are neither datakeys with only one landline nor landlines with only one datakey) not specifically on datakeys with a single landline but on datakeys with whatever number of landlines the datakey with least remaining landlines has it will work. You then have to choose a good algrithm to deal with the case where that number isn't 1 (perhaps start by looking for landlines that occur only once in the temp table - but once again you have to be aware that there may not be any).  Perhaps you might have a similar thing for landlines with the minimum number of datakeys, and pick one of these two things according to which was the type of main step which detected the first of a consecutive pair of "no singleton" conditions.  I don't know which is better.Also, you have to decide you algorithm for step 3 (choose to keep the record with the lowest datakey for this landline, perhaps?  I imagine it doesn't much matter what it is, as long as it's easy to specify and doesn't cost an arm and a leg to compute).</description><pubDate>Tue, 17 Jan 2012 12:57:32 GMT</pubDate><dc:creator>L' Eomot Inversé</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote]If the data could be processed in the following manner that would be great1. Extract all the data with a landline2. Dedupe the data on datakey - keep only one record (e.g. 3 duplicate datakeys would become just one datakey)3. Dedupe the data on landline - same rule as above[/quote]This misses which record to keep in step 2. That is the big issue here... Which record (if any) get retained for comparison on landline from datakey? Clearly, that matters.  I believe it is based on maximizing the count; i.e. if a data key only has 1 unique landline, do not count that landline in the determination of other datakey counts. Right?EDIT: If the above is true, then a count of distinct landlines where landline meets the acceptable criteria for length is the same thing.[/quote]Ok, my edit was wrong because in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?[/quote]Your question, "in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?" - The answer is yes.Just to try and simplify my request, this is how I have been told to preform these counts in the past: (Indkey is a unique key)[code="plain"]select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''Select max(indkey) as indkey,  max(datakey) as datakeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])Select max(indkey) as indkey,  max(landline) as landlineinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])[/code]I know this is certainly not the best way to do these and is very slow hence why I am trying to get a faster, better solution.[/quote]This will also give you inconsistent results that do not follow any concrete rules. This is arbitrary. Kind of scary actually.</description><pubDate>Tue, 17 Jan 2012 12:36:33 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote]If the data could be processed in the following manner that would be great1. Extract all the data with a landline2. Dedupe the data on datakey - keep only one record (e.g. 3 duplicate datakeys would become just one datakey)3. Dedupe the data on landline - same rule as above[/quote]This misses which record to keep in step 2. That is the big issue here... Which record (if any) get retained for comparison on landline from datakey? Clearly, that matters.  I believe it is based on maximizing the count; i.e. if a data key only has 1 unique landline, do not count that landline in the determination of other datakey counts. Right?EDIT: If the above is true, then a count of distinct landlines where landline meets the acceptable criteria for length is the same thing.[/quote]Ok, my edit was wrong because in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?[/quote]Your question, "in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?" - The answer is yes.Just to try and simplify my request, this is how I have been told to preform these counts in the past: (Indkey is a unique key)[code="plain"]select * into [tablename1_landlines]from [tablename1]where landline is not null and landline!=''Select max(indkey) as indkey,  max(datakey) as datakeyinto [tablename1_landlines_temp]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp])Select max(indkey) as indkey,  max(landline) as landlineinto [tablename1_landlines_temp2]from [tablename1_landlines]group by datakeydelete from [tablename1_landlines]where indkey not in (select indkey from [tablename1_landlines_temp2])[/code]I know this is certainly not the best way to do these and is very slow hence why I am trying to get a faster, better solution.</description><pubDate>Tue, 17 Jan 2012 12:24:53 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote]If the data could be processed in the following manner that would be great1. Extract all the data with a landline2. Dedupe the data on datakey - keep only one record (e.g. 3 duplicate datakeys would become just one datakey)3. Dedupe the data on landline - same rule as above[/quote]This misses which record to keep in step 2. That is the big issue here... Which record (if any) get retained for comparison on landline from datakey? Clearly, that matters.  I believe it is based on maximizing the count; i.e. if a data key only has 1 unique landline, do not count that landline in the determination of other datakey counts. Right?EDIT: If the above is true, then a count of distinct landlines where landline meets the acceptable criteria for length is the same thing.[/quote]Ok, my edit was wrong because in the case where a datakey has 2 different landlines, but those 2 landlines do not appear on any other datakey... They still only count as 1. I think that is correct?</description><pubDate>Tue, 17 Jan 2012 12:11:02 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote]If the data could be processed in the following manner that would be great1. Extract all the data with a landline2. Dedupe the data on datakey - keep only one record (e.g. 3 duplicate datakeys would become just one datakey)3. Dedupe the data on landline - same rule as above[/quote]This misses which record to keep in step 2. That is the big issue here... Which record (if any) get retained for comparison on landline from datakey? Clearly, that matters.  I believe it is based on maximizing the count; i.e. if a data key only has 1 unique landline, do not count that landline in the determination of other datakey counts. Right?EDIT: If the above is true, then a count of distinct landlines where landline meets the acceptable criteria for length is the same thing.</description><pubDate>Tue, 17 Jan 2012 12:02:26 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>I think this whole thing may be getting massively overcomplicated by the math.Is the actual business requirement simply a count of how many distinct datakeys there are, and a count of how many distinct landline-datakey pairs there are, or did some manager/executive somewhere actually request something this mathematically complicated?What is the actual statement of the exact business need?  As written/stated by the person who will actually be using this data, not summarized or anything like that.</description><pubDate>Tue, 17 Jan 2012 12:00:09 GMT</pubDate><dc:creator>GSquared</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>Here's another attempt at making some progress towards a precise statement or the requirement.I'm going to talk about the landlinecount only, because the datakey count is a trivial one (just count distinct) and the mobile and email counts follow the same principles as the landline count.The number delivered as the landline count is the cardinality of some set A. We have the following constraints on A:C1) A is a subset of the projection of the original table onto its datakey and landline attributes.  C2) None of the pairs in A has the zero length string as it's landline element.  C3) No two pairs in A have the same value in their datakey element.  C4) No two pairs in A have the same value in their landline element.  Obviously A has to be further constrained, because in some sense it needs to "cover" the original data (otherwise we could just forget about the data and set the landline count to zero). There are only two possible constraints on A that I can imagine being useful and/or acceptable (because other possible constraints imply constraints on the original data and we don't know what such constraints, if any, are permissable).P1) A has the highest cardinality of any set satisfying C1,C2,C3 and C4.P2) A is maximal in the sense that it has no proper superset which satisfies C1,C2,C3,C4.P3) A is minimal in the sense that it has the smallest cardinality of any set satisfying C1,C2,C3,C4 and P2.P1 gives us a costly maximisation problem; it entails getting the cardinality of every set that satisfies C1,C2,C3 and C4 (or at least of every one of them that satisfies P2).P3 gives us a costly minimisation problem; it entails finding the cardinalities of all the sets satisfying the C1-C4 and P2 and taking the minimum.P2 seems less difficult, it only needs to find one maximal set; but depending on the algorithms used a change to metadata might result in a different maximal set being found for the same data, and Bicky has stated that that would be unacceptable; so P2 has to come with a constraint that the algorithms employed must not change results if metadata changes.Is there some other constraint than P1, P2, or P3 that would lead to the "right" value for the landline count?  Are the conditions C1,C2,C3 and C4 the correct ones to start with?  They are what I've understood Bicky to say, but I could be wrong.Does anyone have an idea for a decent algorithm to work with P2?  Obviously there is a recursive (or iterative instead, of course) construction for such a set, using subcounts plus either the native ordering of strings or of the attribute value to make choices where needed, and it's fairly easy to make it reasonably close to P1 by using subcounts to drive it wherever possible (Bicky's reply to an earlier post suggests that he'd prefer to be closer to P1, or even actually there, if it can be done reasonably) without of course any guarantee that P1 is actually reached, but I'm worried about performance using recursive CTEs or an iterative loop.  If we knew the maximum number of landlines that could be associated with a single datakey we would know a bound on the depth of recursion or number of times round the iterative loop and could potentially unroll it if the number is low enough.  Does anyone know a more direct al;gorithm?Does anyone know how to go for P1 with certainty of getting there? If there's a good efficient algorithm that can be expressed tidily in a relational manner that would probably be the best way to go?</description><pubDate>Tue, 17 Jan 2012 11:53:53 GMT</pubDate><dc:creator>L' Eomot Inversé</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]ChrisM@home (1/17/2012)[/b][hr][quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...[/quote]The required order is dedupe datakey first, followed by landline.[/quote]Bicky, can you demonstrate with some sample data so everyone clearly understands? Cheers.[/quote]What do you need me to demonstrate? Getting confused now with all the different answers. :) do you just need a dataset to work with?[/quote]I think we really need the rules. They have not been clearly defined. Let me see if I can give what I understand (for landlines) and if it is correct, great.  If not, please try to adjust using my framework, this is not necessarily the best approach in SQL, but the easiest way to understand it:1. Insert ALL data into a staging table2. Find all datakeys that only have 1 unique landline in the staging table and insert the datakey and landline into a temp table3. delete any duplicate landlines from the temp table4. join the staging table on the temp table and delete any landlines from it that exist in the temp table including the data that moved to the temp table5. Insert the data from the temp table BACK into the staging table.6. Repeat the process until no duplicate landlinesAlthough this may not be the best procedural solution in SQL... Does it fundamentally describe the business rules? I couldn't come up with a better stepwise way to illustrate my understanding of the rules :)[/quote]If the data could be processed in the following manner that would be great1. Extract all the data with a landline2. Dedupe the data on datakey - keep only one record (e.g. 3 duplicate datakeys would become just one datakey)3. Dedupe the data on landline - same rule as above</description><pubDate>Tue, 17 Jan 2012 11:52:34 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]ChrisM@home (1/17/2012)[/b][hr][quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...[/quote]The required order is dedupe datakey first, followed by landline.[/quote]Bicky, can you demonstrate with some sample data so everyone clearly understands? Cheers.[/quote]What do you need me to demonstrate? Getting confused now with all the different answers. :) do you just need a dataset to work with?[/quote]I think we really need the rules. They have not been clearly defined. Let me see if I can give what I understand (for landlines) and if it is correct, great.  If not, please try to adjust using my framework, this is not necessarily the best approach in SQL, but the easiest way to understand it:1. Insert ALL data into a staging table2. Find all datakeys that only have 1 unique landline in the staging table and insert the datakey and landline into a temp table3. delete any duplicate landlines from the temp table4. join the staging table on the temp table and delete any landlines from it that exist in the temp table including the data that moved to the temp table5. Insert the data from the temp table BACK into the staging table.6. Repeat the process until no duplicate landlinesAlthough this may not be the best procedural solution in SQL... Does it fundamentally describe the business rules? I couldn't come up with a better stepwise way to illustrate my understanding of the rules :)EDIT: So we are systematically eliminating duplicate landlines ensuring that datakeys that only have 1 unique landline get precedence for the count over datakeys that have more than 1 unique landline.</description><pubDate>Tue, 17 Jan 2012 11:44:45 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]ChrisM@home (1/17/2012)[/b][hr][quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...[/quote]The required order is dedupe datakey first, followed by landline.[/quote]Bicky, can you demonstrate with some sample data so everyone clearly understands? Cheers.[/quote]What do you need me to demonstrate? Getting confused now with all the different answers. :) do you just need a dataset to work with?</description><pubDate>Tue, 17 Jan 2012 11:24:03 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/17/2012)[/b][hr][quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...[/quote]The required order is dedupe datakey first, followed by landline.[/quote]Bicky, can you demonstrate with some sample data so everyone clearly understands? Cheers.</description><pubDate>Tue, 17 Jan 2012 11:19:37 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...[/quote]Tom nailed it [url=http://www.sqlservercentral.com/Forums/FindPost1235912.aspx][u]some posts ago [/u][/url]when he mentioned 'constrained maximisations'. Thanks Tom, I didn't know the problem had a name. The query I posted last, dealing with landlines only, does work [i]so far [/i] (i.e. with a large dataset and the "current spec") but may require a third iteration (or even fourth) to eliminate residuals on really large data sets.Since data will be eliminated by design, and there's a choice of data to eliminate, then of course a part of the spec is missing. Priority must be given to either datakey or landline, and also to sort order of each.My query doesn't appear to give a priority to either datakey or landline, instead simply giving priority to lowest over highest for both columns together. It may not be a valid solution. If it isn't, then I hope it helps others to understand the problem and arrive at a valid one.</description><pubDate>Tue, 17 Jan 2012 11:17:44 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]SQLKnowItAll (1/17/2012)[/b][hr][quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...[/quote]The required order is dedupe datakey first, followed by landline.</description><pubDate>Tue, 17 Jan 2012 11:17:25 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]L' Eomot Inversé (1/17/2012)[/b]a set of datakey-landline pairs such that each landline and each datakey occurs only once[/quote]As far as I can tell, this is an impossible statement without saying WHICH comes first.. Eliminate duplicate landlines when one is found and then duplicate datakeys, or the other way around. It seems that the request is not set based, but order based. However, I still don't know what the order is...</description><pubDate>Tue, 17 Jan 2012 10:40:14 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]GSquared (1/17/2012)[/b][hr]The results from my first query are exactly what you list as the "should be" answer.Re-ran the first version, copy-and-paste of results is:[code="sql"]CountType	QtyTotal	9Datakeys	6Landlines	6Mobiles	5Emails	3[/code][/quote]Yes, but there's a problem Gus; that test case doesn't cover all the requirements. The requirement is to count not landlines, but a set of datakey-landline pairs such that each landline and each datakey occurs only once. The other counts have a similar requirement. So with this test dataset[code]TRUNCATE TABLE #test2     INSERT  INTO #test2 VALUES    ('1', '0001', '01234567890', '', '1@test.co.uk')  , ('2', '0001', '01234567891', '', '1@test.co.uk')  , ('3', '0001', '01234567892', '', '2@test.co.uk') ;[/code]the required result is 3,1,1,0,1 but your code returns 3,1,3,0,2.Actually there is still a lot of detail missing from the requirement, which is a pity because I suspect that if we had the exact requirements we could either come up with a solution or explain that it would be sensible to derive the answers using some tool other than T-SQL.</description><pubDate>Tue, 17 Jan 2012 10:33:26 GMT</pubDate><dc:creator>L' Eomot Inversé</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/17/2012)[/b]I have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)[/quote]To confirm... You are saying that if the ONLY time 2 landlines appear is with the same datakey, that should only count as 1. Correct? To shorten the data for ease of understanding:DataKey, LandLine1, 1001, 2002, 1002, 2002, 300How many landlines is this? 2? And if the line with the 300 was not there... it would be 1?  I have to be honest that, having a background in statistics, I am really puzzled at the question that this is trying to answer. Any chance you could appease me by phrasing the business question as something like "How many landlines are in use by different datakeys?" OR "Collapsing across landlines and then datakeys, what is the count?"  Something just doesn't make sense here as an interpretation of results.  I am not saying that you are wrong or anything, just trying to understand because right now I can't answer a question that I don't understand.  I have done several similar analyses and have scripts in producation that are giving a similar output.  If I can understand the question and thereby the business rules a bit clearer, I can simply adapt my current script and post it for you.</description><pubDate>Tue, 17 Jan 2012 08:11:23 GMT</pubDate><dc:creator>SQLKnowItAll</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]GSquared (1/17/2012)[/b][hr][quote][b]bicky1980 (1/13/2012)[/b][hr][quote][b]GSquared (1/13/2012)[/b][hr]The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?E.g.:datakey landline pairs:1 1002 100Would that count as 1 or 0?  Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure".  Does that mean exclude it entirely, or exclude the duplicates?If that should count as 1, then this seems to work:[code="sql"]IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL	DROP TABLE #test2;	CREATE TABLE #test2    (indkey NVARCHAR(2),     datakey NVARCHAR(4),     landline NVARCHAR(11),     mobile NVARCHAR(11),     email NVARCHAR(20))     INSERT  INTO #test2VALUES  ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),        ('2', '0001', '01234567890', '', '1@test.co.uk'),        ('3', '0002', '01234567890', '', '2@test.co.uk'),        ('4', '0002', '01234567891', '', '2@test.co.uk'),        ('5', '0002', '', '07123456789', ''),        ('6', '0003', '01234567892', '07123456791', ''),        ('7', '0004', '01234567893', '07123456792', ''),        ('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),        ('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;SELECT 'Total' AS CountType, COUNT(*) AS QtyFROM #test2UNION ALLSELECT 'Datakeys', COUNT(DISTINCT datakey)FROM #test2UNION ALLSELECT 'Landlines', COUNT(DISTINCT landline)FROM #test2WHERE landline &amp;gt; ''AND datakey &amp;gt; ''UNION ALLSELECT 'Mobiles', COUNT(DISTINCT mobile)FROM #test2WHERE mobile &amp;gt; ''AND datakey &amp;gt; ''UNION ALLSELECT 'Emails', COUNT(DISTINCT email)FROM #test2WHERE email &amp;gt; ''AND datakey &amp;gt; '';[/code]If 0, then change the Select to this:[code="sql"]SELECT 'Total' AS CountType, COUNT(*) AS QtyFROM #test2UNION ALLSELECT 'Datakeys', COUNT(DISTINCT datakey)FROM #test2UNION ALLSELECT  'Landlines',        COUNT(DISTINCT landline)FROM    (SELECT landline         FROM   #test2         WHERE  landline &amp;gt; ''                AND datakey &amp;gt; ''         GROUP BY landline         HAVING COUNT(*) = 1) AS SubUNION ALLSELECT  'Mobiles',        COUNT(DISTINCT mobile)FROM    (SELECT mobile         FROM   #test2         WHERE  mobile &amp;gt; ''                AND datakey &amp;gt; ''         GROUP BY mobile         HAVING COUNT(*) = 1) AS SubUNION ALLSELECT  'Emails',        COUNT(DISTINCT email)FROM    (SELECT email         FROM   #test2         WHERE  email &amp;gt; ''                AND datakey &amp;gt; ''         GROUP BY email         HAVING COUNT(*) = 1) AS Sub;[/code](All scripts tested in SQL 2008 R2 Dev Edition.)[/quote]I would only require this to be counted as 1 record only...From your solution for this the results would be:Total	9Datakeys	6Landlines	5Mobiles	4Emails	1The results should be:Total	9Datakeys	6Landlines	6Mobiles	5Emails	3[/quote]The results from my first query are exactly what you list as the "should be" answer.Re-ran the first version, copy-and-paste of results is:[code="sql"]CountType	QtyTotal	9Datakeys	6Landlines	6Mobiles	5Emails	3[/code][/quote]Hello GSquaredI have run your code on my sample of approx 100,000 records but its difficult to understand why this would work, I dont see where you code ensures that if I had 2 different landlines at the same datakey, these wouldn't be counted seperately (I ran the 1st code)</description><pubDate>Tue, 17 Jan 2012 07:51:12 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote]How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?[/quote]Paste it into a word document, then copy and paste again. I've got a word doc on my desktop specifically for this.[quote][b]bicky1980 (1/17/2012)[/b][hr]Hello ChrisI cant post the results due to data protection :( , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.  [/quote]Change the phone numbers (probably the area codes alone will be enough), but be careful that you don't give different random numbers to dupes of the same real number!When you get the opportunity, try GSquared's code.</description><pubDate>Tue, 17 Jan 2012 07:31:18 GMT</pubDate><dc:creator>ChrisM@home</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>How do I copy code from the forum to paste into Management studio, without having to refomat it myself everytime?</description><pubDate>Tue, 17 Jan 2012 06:36:21 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>Hello ChrisI cant post the results due to data protection :( , which is going to be difficult to try and get you to see my issue, this was why I was using a small table of 'made up' records.  </description><pubDate>Tue, 17 Jan 2012 06:32:42 GMT</pubDate><dc:creator>bicky1980</dc:creator></item><item><title>RE: Help with Counts</title><link>http://www.sqlservercentral.com/Forums/Topic1234021-1292-1.aspx</link><description>[quote][b]bicky1980 (1/13/2012)[/b][hr][quote][b]GSquared (1/13/2012)[/b][hr]The order of the rows in a dataset doesn't really matter to set-based SQL operations on it.I just need a clarification on this statement, "Total number of populated Landline fields (Needs to have a unique datakey as well as be unique itself - If the same landline is anywhere else in the results its needs to be excluded from the figure)"If a landline exists multiple times in the data, and has multiple datakey values, don't count it all, or count it only one time?E.g.:datakey landline pairs:1 1002 100Would that count as 1 or 0?  Clarification of "If the same landline is anywhere else in the results its needs to be excluded from the figure".  Does that mean exclude it entirely, or exclude the duplicates?If that should count as 1, then this seems to work:[code="sql"]IF OBJECT_ID(N'tempdb..#test2') IS NOT NULL	DROP TABLE #test2;	CREATE TABLE #test2    (indkey NVARCHAR(2),     datakey NVARCHAR(4),     landline NVARCHAR(11),     mobile NVARCHAR(11),     email NVARCHAR(20))     INSERT  INTO #test2VALUES  ('1', '0001', '01234567890', '0712345679', '1@test.co.uk'),        ('2', '0001', '01234567890', '', '1@test.co.uk'),        ('3', '0002', '01234567890', '', '2@test.co.uk'),        ('4', '0002', '01234567891', '', '2@test.co.uk'),        ('5', '0002', '', '07123456789', ''),        ('6', '0003', '01234567892', '07123456791', ''),        ('7', '0004', '01234567893', '07123456792', ''),        ('8', '0005', '01234567894', '07123456793', '2@test.co.uk'),        ('9', '0008', '01234567895', '07123456793', '9@test.co.uk') ;SELECT 'Total' AS CountType, COUNT(*) AS QtyFROM #test2UNION ALLSELECT 'Datakeys', COUNT(DISTINCT datakey)FROM #test2UNION ALLSELECT 'Landlines', COUNT(DISTINCT landline)FROM #test2WHERE landline &amp;gt; ''AND datakey &amp;gt; ''UNION ALLSELECT 'Mobiles', COUNT(DISTINCT mobile)FROM #test2WHERE mobile &amp;gt; ''AND datakey &amp;gt; ''UNION ALLSELECT 'Emails', COUNT(DISTINCT email)FROM #test2WHERE email &amp;gt; ''AND datakey &amp;gt; '';[/code]If 0, then change the Select to this:[code="sql"]SELECT 'Total' AS CountType, COUNT(*) AS QtyFROM #test2UNION ALLSELECT 'Datakeys', COUNT(DISTINCT datakey)FROM #test2UNION ALLSELECT  'Landlines',        COUNT(DISTINCT landline)FROM    (SELECT landline         FROM   #test2         WHERE  landline &amp;gt; ''                AND datakey &amp;gt; ''         GROUP BY landline         HAVING COUNT(*) = 1) AS SubUNION ALLSELECT  'Mobiles',        COUNT(DISTINCT mobile)FROM    (SELECT mobile         FROM   #test2         WHERE  mobile &amp;gt; ''                AND datakey &amp;gt; ''         GROUP BY mobile         HAVING COUNT(*) = 1) AS SubUNION ALLSELECT  'Emails',        COUNT(DISTINCT email)FROM    (SELECT email         FROM   #test2         WHERE  email &amp;gt; ''                AND datakey &amp;gt; ''         GROUP BY email         HAVING COUNT(*) = 1) AS Sub;[/code](All scripts tested in SQL 2008 R2 Dev Edition.)[/quote]I would only require this to be counted as 1 record only...From your solution for this the results would be:Total	9Datakeys	6Landlines	5Mobiles	4Emails	1The results should be:Total	9Datakeys	6Landlines	6Mobiles	5Emails	3[/quote]The results from my first query are exactly what you list as the "should be" answer.Re-ran the first version, copy-and-paste of results is:[code="sql"]CountType	QtyTotal	9Datakeys	6Landlines	6Mobiles	5Emails	3[/code]</description><pubDate>Tue, 17 Jan 2012 06:29:44 GMT</pubDate><dc:creator>GSquared</dc:creator></item></channel></rss>