Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Group on predominant fuzzy time Expand / Collapse
Author
Message
Posted Friday, October 4, 2013 10:22 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:57 PM
Points: 546, Visits: 1,062
Ok, this will get you started with sample data:

create table #groupings (
ID int,
EventName varchar(255),
Venue varchar(255),
EventDate datetime
)

insert into #groupings
values (1,'Football','Arena','2013-10-04 09:58:42.670'),
(2,'Football','Arena','2013-10-04 09:58:42.670'),
(3,'Football','Arena','2013-10-04 10:45:42.670'),
(4,'Football','Arena','2013-10-04 09:58:42.670'),
(1,'Soccer','Field','2013-10-04 09:58:42.670'),
(2,'Soccer','Field','2013-10-04 09:38:42.670'),
(3,'Soccer','Field','2013-10-04 09:58:42.670'),
(4,'Soccer','Field','2013-10-04 09:28:42.670')


select min(EventName) EventName, Venue, EventDate,
stuff((select ',' + cast(ID as varchar) from #groupings where Venue = g.venue and Eventdate = g.eventdate for xml path ('')),1,1,'') IDs
from #groupings g
group by Venue, EventDate
order by venue, eventdate

drop table #groupings

This gives me:

EventName Venue EventDate IDs
Football Arena 2013-10-04 09:58:42.670 1,2,4
Football Arena 2013-10-04 10:45:42.670 3
Soccer Field 2013-10-04 09:28:42.670 4
Soccer Field 2013-10-04 09:38:42.670 2
Soccer Field 2013-10-04 09:58:42.670 1,3


and what I need is:


EventName Venue EventDate IDs
Football Arena 2013-10-04 09:58:42.670 1,2,3,4
Soccer Field 2013-10-04 09:58:42.670 1,2,3,4


I know, group on EventName and Venue instead of time but that's not possible as the Event names in my real life data are all different and I have no option but to group on Venue and DateTime (thus the min(EventName) to homogenize them).

What I don't know how to do is how to do something like:

group by Venue, fuzzy-time(EventDate +/- an hour). Oh, and if I really want the data to be correct I need the incorrect times to yield to the majority. If there are two identical times and one unique time I need the result to be the time the "majority" agrees on.

Ideas?

Erin
Post #1501671
Posted Friday, October 4, 2013 10:26 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, November 20, 2014 9:51 AM
Points: 534, Visits: 1,042
Sorry could you please correct your SQL which populates the table?
Post #1501675
Posted Friday, October 4, 2013 10:42 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:57 PM
Points: 546, Visits: 1,062
(8 row(s) affected)


Works just fine on my machine. Anyone else having difficulties with the SQL?
Post #1501681
Posted Friday, October 4, 2013 10:52 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, November 20, 2014 9:51 AM
Points: 534, Visits: 1,042
I had to change few things but try this:

create table #groupings (
ID int,
EventName varchar(255),
Venue varchar(255),
EventDate datetime
)

insert into #groupings

select 1,'Football','Arena','2013-10-04 09:58:42.670'
UNion
select 2,'Football','Arena','2013-10-04 09:58:42.670'
union
select 3,'Football','Arena','2013-10-04 10:45:42.670'
union
select 4,'Football','Arena','2013-10-04 09:58:42.670'
union
select 1,'Soccer','Field','2013-10-04 09:58:42.670'
union
select 2,'Soccer','Field','2013-10-04 09:38:42.670'
union
select 3,'Soccer','Field','2013-10-04 09:58:42.670'
union
select 4,'Soccer','Field','2013-10-04 09:28:42.670'



select EventName, Venue, min(EventDate),
substring((select ',' + cast(ID as varchar)
from #groupings g1
where g1.EventName= g2.EventName
order by g1.EventName
for xml path ('')),2, 1000) [Students]
from #groupings g2
group by EventName, Venue
Post #1501686
Posted Friday, October 4, 2013 10:59 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:57 PM
Points: 546, Visits: 1,062
A valiant effort, DVA2007, but you'll note in my first post that I cannot group on Event Name because they are disparate (sorry I mislead you with my test data).

For example, you've got:


Football
Raiders vs Broncos
Pigskin huddle
Football

as event names instead of the generic 'football' that I've put in place there. Thus the reason for using the min() function.

More ideas.
Post #1501691
Posted Friday, October 4, 2013 11:05 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:54 PM
Points: 13,221, Visits: 12,699
Erin Ramsay (10/4/2013)
More ideas.


Normalize your data and then you aren't fighting this. Honestly the biggest challenge you are facing is the lack of normalization.

That being said you have a real challenge here. You might be able to use the logic from today's article about grouping islands of contiguous dates. I am pretty swamped at the moment but if nobody else jumps in to help I can try to take a look at this early next week.

http://www.sqlservercentral.com/articles/T-SQL/71550/


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1501693
Posted Friday, October 4, 2013 11:08 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:57 PM
Points: 546, Visits: 1,062
Nevermind... if I flatten the hour out of the datetime I can do it. Thanks to SQLPrincess for the co-solution!!
Post #1501697
Posted Friday, October 4, 2013 11:13 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Friday, November 21, 2014 12:57 PM
Points: 546, Visits: 1,062
You know, Sean, if I could normalize the data I would but it's third party data and I don't have that option.. good thought though.
Post #1501698
Posted Sunday, October 6, 2013 1:07 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Wednesday, November 19, 2014 2:45 PM
Points: 1,080, Visits: 3,170
I know this is a few days late, but I wanted to have a try at this. Here is what I came up with. Does it do what you require?
with countDates as (
select venue,
eventdate,
dateadd(hour, -1, eventdate) beforedate,
dateadd(hour, 1, eventdate) afterdate,
count(*) over (partition by venue, eventdate) repeated
from #groupings
)
select
eventname,
venue,
ca.eventdate
from #groupings g
cross apply (
select top 1 eventdate
from countDates c
where g.venue = c.venue and
g.eventdate between beforedate and afterdate
order by repeated desc) ca
group by
eventname,
venue,
ca.eventdate;

Post #1501948
Posted Sunday, October 6, 2013 6:48 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:53 AM
Points: 3,422, Visits: 5,368
It appears I'm also late to the party but perhaps this is another alternative:

SELECT EventName, Venue, EventDate
,IDs=STUFF((
SELECT ', ' + CAST(ID AS VARCHAR(5))
FROM #groupings b
WHERE a.EventName = b.EventName AND a.Venue = b.Venue
ORDER BY ID
FOR XML PATH('')
), 1, 2, '')
FROM
(
SELECT ID, EventName, Venue, rc, EventDate
,mr=MAX(rc) OVER (PARTITION BY EventName, Venue)
FROM
(
SELECT ID, EventName, Venue, EventDate
,rc=COUNT(*) OVER (PARTITION BY EventName, Venue, EventDate)
FROM #groupings
) a
) a
WHERE mr=rc
GROUP BY EventName, Venue, EventDate;





My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1501978
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse