Group on predominant fuzzy time

  • Ok, this will get you started with sample data:

    create table #groupings (

    ID int,

    EventName varchar(255),

    Venue varchar(255),

    EventDate datetime

    )

    insert into #groupings

    values (1,'Football','Arena','2013-10-04 09:58:42.670'),

    (2,'Football','Arena','2013-10-04 09:58:42.670'),

    (3,'Football','Arena','2013-10-04 10:45:42.670'),

    (4,'Football','Arena','2013-10-04 09:58:42.670'),

    (1,'Soccer','Field','2013-10-04 09:58:42.670'),

    (2,'Soccer','Field','2013-10-04 09:38:42.670'),

    (3,'Soccer','Field','2013-10-04 09:58:42.670'),

    (4,'Soccer','Field','2013-10-04 09:28:42.670')

    select min(EventName) EventName, Venue, EventDate,

    stuff((select ',' + cast(ID as varchar) from #groupings where Venue = g.venue and Eventdate = g.eventdate for xml path ('')),1,1,'') IDs

    from #groupings g

    group by Venue, EventDate

    order by venue, eventdate

    drop table #groupings

    This gives me:

    EventName VenueEventDate IDs

    FootballArena 2013-10-04 09:58:42.6701,2,4

    FootballArena 2013-10-04 10:45:42.6703

    SoccerField 2013-10-04 09:28:42.6704

    SoccerField 2013-10-04 09:38:42.6702

    SoccerField 2013-10-04 09:58:42.6701,3

    and what I need is:

    EventName VenueEventDate IDs

    FootballArena 2013-10-04 09:58:42.6701,2,3,4

    SoccerField 2013-10-04 09:58:42.6701,2,3,4

    I know, group on EventName and Venue instead of time but that's not possible as the Event names in my real life data are all different and I have no option but to group on Venue and DateTime (thus the min(EventName) to homogenize them).

    What I don't know how to do is how to do something like:

    group by Venue, fuzzy-time(EventDate +/- an hour). Oh, and if I really want the data to be correct I need the incorrect times to yield to the majority. If there are two identical times and one unique time I need the result to be the time the "majority" agrees on.

    Ideas?

    Erin

  • Sorry could you please correct your SQL which populates the table?

  • (8 row(s) affected)

    Works just fine on my machine. Anyone else having difficulties with the SQL?

  • I had to change few things but try this:

    create table #groupings (

    ID int,

    EventName varchar(255),

    Venue varchar(255),

    EventDate datetime

    )

    insert into #groupings

    select 1,'Football','Arena','2013-10-04 09:58:42.670'

    UNion

    select 2,'Football','Arena','2013-10-04 09:58:42.670'

    union

    select 3,'Football','Arena','2013-10-04 10:45:42.670'

    union

    select 4,'Football','Arena','2013-10-04 09:58:42.670'

    union

    select 1,'Soccer','Field','2013-10-04 09:58:42.670'

    union

    select 2,'Soccer','Field','2013-10-04 09:38:42.670'

    union

    select 3,'Soccer','Field','2013-10-04 09:58:42.670'

    union

    select 4,'Soccer','Field','2013-10-04 09:28:42.670'

    select EventName, Venue, min(EventDate),

    substring((select ',' + cast(ID as varchar)

    from #groupings g1

    where g1.EventName= g2.EventName

    order by g1.EventName

    for xml path ('')),2, 1000) [Students]

    from #groupings g2

    group by EventName, Venue

  • A valiant effort, DVA2007, but you'll note in my first post that I cannot group on Event Name because they are disparate (sorry I mislead you with my test data).

    For example, you've got:

    Football

    Raiders vs Broncos

    Pigskin huddle

    Football

    as event names instead of the generic 'football' that I've put in place there. Thus the reason for using the min() function.

    More ideas.

  • Erin Ramsay (10/4/2013)

    More ideas.

    Normalize your data and then you aren't fighting this. Honestly the biggest challenge you are facing is the lack of normalization.

    That being said you have a real challenge here. You might be able to use the logic from today's article about grouping islands of contiguous dates. I am pretty swamped at the moment but if nobody else jumps in to help I can try to take a look at this early next week.

    http://www.sqlservercentral.com/articles/T-SQL/71550/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Nevermind... if I flatten the hour out of the datetime I can do it. Thanks to SQLPrincess for the co-solution!!

  • You know, Sean, if I could normalize the data I would but it's third party data and I don't have that option.. good thought though.

  • I know this is a few days late, but I wanted to have a try at this. Here is what I came up with. Does it do what you require?

    with countDates as (

    select venue,

    eventdate,

    dateadd(hour, -1, eventdate) beforedate,

    dateadd(hour, 1, eventdate) afterdate,

    count(*) over (partition by venue, eventdate) repeated

    from #groupings

    )

    select

    eventname,

    venue,

    ca.eventdate

    from #groupings g

    cross apply (

    select top 1 eventdate

    from countDates c

    where g.venue = c.venue and

    g.eventdate between beforedate and afterdate

    order by repeated desc) ca

    group by

    eventname,

    venue,

    ca.eventdate;

  • It appears I'm also late to the party but perhaps this is another alternative:

    SELECT EventName, Venue, EventDate

    ,IDs=STUFF((

    SELECT ', ' + CAST(ID AS VARCHAR(5))

    FROM #groupings b

    WHERE a.EventName = b.EventName AND a.Venue = b.Venue

    ORDER BY ID

    FOR XML PATH('')

    ), 1, 2, '')

    FROM

    (

    SELECT ID, EventName, Venue, rc, EventDate

    ,mr=MAX(rc) OVER (PARTITION BY EventName, Venue)

    FROM

    (

    SELECT ID, EventName, Venue, EventDate

    ,rc=COUNT(*) OVER (PARTITION BY EventName, Venue, EventDate)

    FROM #groupings

    ) a

    ) a

    WHERE mr=rc

    GROUP BY EventName, Venue, EventDate;


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply