Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

SQL statement for new vs duplicate data Expand / Collapse
Author
Message
Posted Tuesday, May 7, 2013 11:43 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, May 12, 2013 4:27 PM
Points: 2, Visits: 18
I'm looking for a SQL statement that will output 'new vs duplicate' data on a monthly bases, but I can't seem to come up SQL statement that can accomplish what I need. See example below to get a better idea of what I'm trying to do since this is hard to explain.

Example data:

id, num, create_date

// January
1, 34250, 2012-01-01
2, 25383, 2012-01-03
3, 93482, 2012-01-13
4, 52934, 2012-01-20
5, 34823, 2012-01-23

// Febuary
6, 09943, 2012-02-03
7, 32845, 2012-02-05
8, 34250, 2012-02-16 -- same num as id 1
9, 57325, 2012-02-28
10, 34823, 2012-02-28 -- same num as id 5

// March
11, 95234, 2012-03-05
12, 25383, 2012-03-03 -- same num as id 2
13, 32845, 2012-03-16 -- same num as id 7
14, 34250, 2012-03-28 -- same num as id 1, 8
15, 12412, 2012-03-28

// April
16, 53535, 2012-04-05
17, 23232, 2012-04-03
18, 57325, 2012-04-16 -- same num as id 9
19, 34250, 2012-04-28 -- same num as id 1, 8, 14
20, 12343, 2012-04-28

Example results from data above would be something like below (just looking for new vs duplicate numbers):

January: 5 new and 0 duplicates (no duplicates will ever be found the first month)
Febuary: 3 new and 2 duplicates (checks for duplicates in the month of Jan only)
March: 2 new and 3 duplicates (checks for duplicates in the month of Jan & Feb only)
April: 3 new and 2 duplicates (checks for duplicates in the month of Jan, Feb & March only)
...and so on for each month after that

Hopefully you can see what I'm trying to do. Also, the total amount (new + duplicate) will always be 5 based on the data above. Any real smart person out there that can help?

Edit: Put in wrong forum (I'm using SQL Server 2005)
Post #1450404
Posted Wednesday, May 8, 2013 1:52 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:48 PM
Points: 1,932, Visits: 19,917
<deleted>....not working as required

______________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Post #1450775
Posted Thursday, May 9, 2013 9:33 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 28, 2014 12:50 PM
Points: 1,061, Visits: 2,580
avitron2020 (5/7/2013)
I'm looking for a SQL statement that will output 'new vs duplicate' data on a monthly bases, but I can't seem to come up SQL statement that can accomplish what I need. See example below to get a better idea of what I'm trying to do since this is hard to explain.

Example data:

id, num, create_date

// January
1, 34250, 2012-01-01
2, 25383, 2012-01-03
3, 93482, 2012-01-13
4, 52934, 2012-01-20
5, 34823, 2012-01-23

// Febuary
6, 09943, 2012-02-03
7, 32845, 2012-02-05
8, 34250, 2012-02-16 -- same num as id 1
9, 57325, 2012-02-28
10, 34823, 2012-02-28 -- same num as id 5

// March
11, 95234, 2012-03-05
12, 25383, 2012-03-03 -- same num as id 2
13, 32845, 2012-03-16 -- same num as id 7
14, 34250, 2012-03-28 -- same num as id 1, 8
15, 12412, 2012-03-28

// April
16, 53535, 2012-04-05
17, 23232, 2012-04-03
18, 57325, 2012-04-16 -- same num as id 9
19, 34250, 2012-04-28 -- same num as id 1, 8, 14
20, 12343, 2012-04-28

Example results from data above would be something like below (just looking for new vs duplicate numbers):

January: 5 new and 0 duplicates (no duplicates will ever be found the first month)
Febuary: 3 new and 2 duplicates (checks for duplicates in the month of Jan only)
March: 2 new and 3 duplicates (checks for duplicates in the month of Jan & Feb only)
April: 3 new and 2 duplicates (checks for duplicates in the month of Jan, Feb & March only)
...and so on for each month after that

Hopefully you can see what I'm trying to do. Also, the total amount (new + duplicate) will always be 5 based on the data above. Any real smart person out there that can help?

Edit: Put in wrong forum (I'm using SQL Server 2005)


This works, assuming that your ID column is ever-increasing (i.e., IDs are assigned in the same order as create_date) - otherwise, you'll need to compare create_dates in the CROSS APPLY subquery:


-- Please remove the extraneous letter "x" from the CxREATE and DxROP keywords - my company filters internet traffic that includes certain T-SQL DDL keywords.

CxREATE TABLE dbo.testData (ID int PRIMARY KEY, number int, create_date datetime)

GO

INSERT INTO dbo.testData (ID, number, create_date)

SELECT 1, 34250, '2012-01-01' UNION ALL
SELECT 2, 25383, '2012-01-03' UNION ALL
SELECT 3, 93482, '2012-01-13' UNION ALL
SELECT 4, 52934, '2012-01-20' UNION ALL
SELECT 5, 34823, '2012-01-23' UNION ALL
SELECT 6, 09943, '2012-02-03' UNION ALL
SELECT 7, 32845, '2012-02-05' UNION ALL
SELECT 8, 34250, '2012-02-16' UNION ALL
SELECT 9, 57325, '2012-02-28' UNION ALL
SELECT 10, 34823, '2012-02-28' UNION ALL
SELECT 11, 95234, '2012-03-05' UNION ALL
SELECT 12, 25383, '2012-03-03' UNION ALL
SELECT 13, 32845, '2012-03-16' UNION ALL
SELECT 14, 34250, '2012-03-28' UNION ALL
SELECT 15, 12412, '2012-03-28' UNION ALL
SELECT 16, 53535, '2012-04-05' UNION ALL
SELECT 17, 23232, '2012-04-03' UNION ALL
SELECT 18, 57325, '2012-04-16' UNION ALL
SELECT 19, 34250, '2012-04-28' UNION ALL
SELECT 20, 12343, '2012-04-28'

GO

SELECT DATEPART(MONTH, td.create_date) AS monthNumber
,DATENAME(MONTH, td.create_date) AS [monthName]
,SUM(CASE WHEN xd.duplicate IS NULL THEN 1 ELSE 0 END) AS nNew
,COUNT(td.ID) - SUM(CASE WHEN xd.duplicate IS NULL THEN 1 ELSE 0 END) as nDuplicates

FROM dbo.testData td

OUTER APPLY (SELECT TOP(1) t.ID AS duplicate FROM dbo.testData t WHERE t.ID < td.ID AND t.number = td.number) xd

GROUP BY DATEPART(MONTH, td.create_date), DATENAME(MONTH, td.create_date)

ORDER BY DATEPART(MONTH, td.create_date)

GO

DxROP TABLE dbo.testData

Two notes about this code:
- It works no matter how many rows you have per month.
- For a larger data set, proper indexing will yield better performance.

Also, notice how I provided DDL for the dbo.testData table and converted your sample data into an INSERT . . . SELECT statement to make it "consumable" for others who may want to work on your question. It's considered good form on this forum to provide DDL, *consumable* test data, and expected results when you ask a question about manipulating specific data so that we can easily set up test environments and provide tested code.


Jason Wolfkill
Blog: SQLSouth
Twitter: @SQLSouth
Post #1451198
Posted Friday, May 10, 2013 10:54 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, May 12, 2013 4:27 PM
Points: 2, Visits: 18
That worked great. Thanks a lot wolfkillj.
Post #1451691
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse