SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


SQL statement for new vs duplicate data


SQL statement for new vs duplicate data

Author
Message
avitron2020
avitron2020
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 18
I'm looking for a SQL statement that will output 'new vs duplicate' data on a monthly bases, but I can't seem to come up SQL statement that can accomplish what I need. See example below to get a better idea of what I'm trying to do since this is hard to explain.

Example data:

id, num, create_date

// January
1, 34250, 2012-01-01
2, 25383, 2012-01-03
3, 93482, 2012-01-13
4, 52934, 2012-01-20
5, 34823, 2012-01-23

// Febuary
6, 09943, 2012-02-03
7, 32845, 2012-02-05
8, 34250, 2012-02-16 -- same num as id 1
9, 57325, 2012-02-28
10, 34823, 2012-02-28 -- same num as id 5

// March
11, 95234, 2012-03-05
12, 25383, 2012-03-03 -- same num as id 2
13, 32845, 2012-03-16 -- same num as id 7
14, 34250, 2012-03-28 -- same num as id 1, 8
15, 12412, 2012-03-28

// April
16, 53535, 2012-04-05
17, 23232, 2012-04-03
18, 57325, 2012-04-16 -- same num as id 9
19, 34250, 2012-04-28 -- same num as id 1, 8, 14
20, 12343, 2012-04-28

Example results from data above would be something like below (just looking for new vs duplicate numbers):

January: 5 new and 0 duplicates (no duplicates will ever be found the first month)
Febuary: 3 new and 2 duplicates (checks for duplicates in the month of Jan only)
March: 2 new and 3 duplicates (checks for duplicates in the month of Jan & Feb only)
April: 3 new and 2 duplicates (checks for duplicates in the month of Jan, Feb & March only)
...and so on for each month after that

Hopefully you can see what I'm trying to do. Also, the total amount (new + duplicate) will always be 5 based on the data above. Any real smart person out there that can help?

Edit: Put in wrong forum (I'm using SQL Server 2005)
J Livingston SQL
J Livingston SQL
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12151 Visits: 37533
<deleted>....not working as required

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

wolfkillj
wolfkillj
SSCrazy
SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)SSCrazy (2.7K reputation)

Group: General Forum Members
Points: 2694 Visits: 2582
avitron2020 (5/7/2013)
I'm looking for a SQL statement that will output 'new vs duplicate' data on a monthly bases, but I can't seem to come up SQL statement that can accomplish what I need. See example below to get a better idea of what I'm trying to do since this is hard to explain.

Example data:

id, num, create_date

// January
1, 34250, 2012-01-01
2, 25383, 2012-01-03
3, 93482, 2012-01-13
4, 52934, 2012-01-20
5, 34823, 2012-01-23

// Febuary
6, 09943, 2012-02-03
7, 32845, 2012-02-05
8, 34250, 2012-02-16 -- same num as id 1
9, 57325, 2012-02-28
10, 34823, 2012-02-28 -- same num as id 5

// March
11, 95234, 2012-03-05
12, 25383, 2012-03-03 -- same num as id 2
13, 32845, 2012-03-16 -- same num as id 7
14, 34250, 2012-03-28 -- same num as id 1, 8
15, 12412, 2012-03-28

// April
16, 53535, 2012-04-05
17, 23232, 2012-04-03
18, 57325, 2012-04-16 -- same num as id 9
19, 34250, 2012-04-28 -- same num as id 1, 8, 14
20, 12343, 2012-04-28

Example results from data above would be something like below (just looking for new vs duplicate numbers):

January: 5 new and 0 duplicates (no duplicates will ever be found the first month)
Febuary: 3 new and 2 duplicates (checks for duplicates in the month of Jan only)
March: 2 new and 3 duplicates (checks for duplicates in the month of Jan & Feb only)
April: 3 new and 2 duplicates (checks for duplicates in the month of Jan, Feb & March only)
...and so on for each month after that

Hopefully you can see what I'm trying to do. Also, the total amount (new + duplicate) will always be 5 based on the data above. Any real smart person out there that can help?

Edit: Put in wrong forum (I'm using SQL Server 2005)


This works, assuming that your ID column is ever-increasing (i.e., IDs are assigned in the same order as create_date) - otherwise, you'll need to compare create_dates in the CROSS APPLY subquery:



-- Please remove the extraneous letter "x" from the CxREATE and DxROP keywords - my company filters internet traffic that includes certain T-SQL DDL keywords.

CxREATE TABLE dbo.testData (ID int PRIMARY KEY, number int, create_date datetime)

GO

INSERT INTO dbo.testData (ID, number, create_date)

SELECT 1, 34250, '2012-01-01' UNION ALL
SELECT 2, 25383, '2012-01-03' UNION ALL
SELECT 3, 93482, '2012-01-13' UNION ALL
SELECT 4, 52934, '2012-01-20' UNION ALL
SELECT 5, 34823, '2012-01-23' UNION ALL
SELECT 6, 09943, '2012-02-03' UNION ALL
SELECT 7, 32845, '2012-02-05' UNION ALL
SELECT 8, 34250, '2012-02-16' UNION ALL
SELECT 9, 57325, '2012-02-28' UNION ALL
SELECT 10, 34823, '2012-02-28' UNION ALL
SELECT 11, 95234, '2012-03-05' UNION ALL
SELECT 12, 25383, '2012-03-03' UNION ALL
SELECT 13, 32845, '2012-03-16' UNION ALL
SELECT 14, 34250, '2012-03-28' UNION ALL
SELECT 15, 12412, '2012-03-28' UNION ALL
SELECT 16, 53535, '2012-04-05' UNION ALL
SELECT 17, 23232, '2012-04-03' UNION ALL
SELECT 18, 57325, '2012-04-16' UNION ALL
SELECT 19, 34250, '2012-04-28' UNION ALL
SELECT 20, 12343, '2012-04-28'

GO

SELECT DATEPART(MONTH, td.create_date) AS monthNumber
,DATENAME(MONTH, td.create_date) AS [monthName]
,SUM(CASE WHEN xd.duplicate IS NULL THEN 1 ELSE 0 END) AS nNew
,COUNT(td.ID) - SUM(CASE WHEN xd.duplicate IS NULL THEN 1 ELSE 0 END) as nDuplicates

FROM dbo.testData td

OUTER APPLY (SELECT TOP(1) t.ID AS duplicate FROM dbo.testData t WHERE t.ID < td.ID AND t.number = td.number) xd

GROUP BY DATEPART(MONTH, td.create_date), DATENAME(MONTH, td.create_date)

ORDER BY DATEPART(MONTH, td.create_date)

GO

DxROP TABLE dbo.testData



Two notes about this code:
- It works no matter how many rows you have per month.
- For a larger data set, proper indexing will yield better performance.

Also, notice how I provided DDL for the dbo.testData table and converted your sample data into an INSERT . . . SELECT statement to make it "consumable" for others who may want to work on your question. It's considered good form on this forum to provide DDL, *consumable* test data, and expected results when you ask a question about manipulating specific data so that we can easily set up test environments and provide tested code.

Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
avitron2020
avitron2020
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 18
That worked great. Thanks a lot wolfkillj.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search