Getting rid of duplicate records

  • Hi All,

    I have a table that stores schedule events. One event can be scheduled more than one time in one location, but could be in multiple sites.
    For example, Location A could have 20 sites.
    One event could be scheduled for dates between 8/3/2017-8/4/2017 on LocationA, Site1 and Site2
    Same event could be scheduled for dates between 8/5/2017 - 8/6/2017 on LocationA, Site1 and Site2

    The data in the table looks like this., which is not what I would do if I have created the DB.
    ScheduleID | Location | StartDate | EndDate | Site
    1              LocationA | 8/3/2017 | 8/4/2017 | Site1
    2              LocationA | 8/3/2017 | 8/4/2017 | Site2
    3              LocationA | 8/5/2017 | 8/6/2017 | Site1
    4             LocationA  | 8/5/2017 | 8/6/2017 | Site2

    Now I have 4 records for the same event. In the schedule I only display two events. I'm using C# to get rid of the duplicates, which is not a very elegant way to do this. Besides that, it's causing performance issues since there could be hundred of events.

    How can I write a query that just gives me two rows, so I don't have to create a child table to store just the Sites referenced by the ScheduleID?

    Thanks in advance
    NC

  • natividad.castro - Thursday, August 3, 2017 6:18 AM

    Hi All,

    I have a table that stores schedule events. One event can be scheduled more than one time in one location, but could be in multiple sites.
    For example, Location A could have 20 sites.
    One event could be scheduled for dates between 8/3/2017-8/4/2017 on LocationA, Site1 and Site2
    Same event could be scheduled for dates between 8/5/2017 - 8/6/2017 on LocationA, Site1 and Site2

    The data in the table looks like this., which is not what I would do if I have created the DB.
    ScheduleID | Location | StartDate | EndDate | Site
    1              LocationA | 8/3/2017 | 8/4/2017 | Site1
    2              LocationA | 8/3/2017 | 8/4/2017 | Site2
    3              LocationA | 8/5/2017 | 8/6/2017 | Site1
    4             LocationA  | 8/5/2017 | 8/6/2017 | Site2

    Now I have 4 records for the same event. In the schedule I only display two events. I'm using C# to get rid of the duplicates, which is not a very elegant way to do this. Besides that, it's causing performance issues since there could be hundred of events.

    How can I write a query that just gives me two rows, so I don't have to create a child table to store just the Sites referenced by the ScheduleID?

    Thanks in advance
    NC

    select Location, StartDate, EndDate, Site
    from table
    group by Location, StartDate, EndDate, Site

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • I'm  actually not entirely sure what you want as a result set here. What expected result set are you looking for? To show that the event runs from 20170803 - 20170806 or just not include the site? Either of those would generate 2 rows.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • ;WITH MySampleData([ScheduleID],[Location],[StartDate],[EndDate],[Site])
    AS
    (
    SELECT '1','LocationA',CONVERT(date,'8/3/2017'),CONVERT(date,'8/4/2017'),'Site1' UNION ALL
    SELECT '2','LocationA','8/3/2017','8/4/2017','Site2' UNION ALL
    SELECT '3','LocationA','8/5/2017','8/6/2017','Site1' UNION ALL
    SELECT '4','LocationA','8/5/2017','8/6/2017','Site2'
    )

    select Location, StartDate, EndDate, MIN(Site) AS Site
    from MySampleData
    group by Location, StartDate, EndDate

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks all for your reply.
    Group by won't work you're still pulling all four records.

    The second option, I would have to do it for every single record. Just imagine doing 1, 2, 1000, etc. and comparing every single value.
    Is there a way to do this dynamically, meaning specifying the column name instead of the value of the column?

    Thanks

  • This looks like it would be a good fit for a CTE using a partition.

    ;
    WITH DEDUPE_CTE AS (
    select
    Location
    ,StartDate
    ,EndDate
    ,Site
    ,ROW_NUMBER() OVER(PARTITION BY Location, StartDate, EndDate ORDER BY Location, StartDate, EndDate, Site) AS SELECT_CRITERIA
    from MySampleData
    )
    SELECT *
    FROM DEDUPE_CTE
    WHERE SELECT_CRITERIA = 1

  • This looks like it would be a good fit for a CTE using a partition.

    ;
    WITH DEDUPE_CTE AS (
    select
    Location
    ,StartDate
    ,EndDate
    ,Site
    ,ROW_NUMBER() OVER(PARTITION BY Location, StartDate, EndDate ORDER BY Location, StartDate, EndDate, Site) AS SELECT_CRITERIA
    from MySampleData
    )
    SELECT *
    FROM DEDUPE_CTE
    WHERE SELECT_CRITERIA = 1

  • natividad.castro - Thursday, August 3, 2017 3:55 PM

    Thanks all for your reply.
    Group by won't work you're still pulling all four records.

    The second option, I would have to do it for every single record. Just imagine doing 1, 2, 1000, etc. and comparing every single value.
    Is there a way to do this dynamically, meaning specifying the column name instead of the value of the column?

    Thanks

    So post what the output should be for the data that you have in your original post because Lowell's code produces only 2 rows as requested.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm with Jeff on this one, because technically, you have no duplicates in your sample data:

    ScheduleID | Location | StartDate | EndDate | Site
    1 LocationA | 8/3/2017 | 8/4/2017 | Site1
    2 LocationA | 8/3/2017 | 8/4/2017 | Site2
    3 LocationA | 8/5/2017 | 8/6/2017 | Site1
    4 LocationA | 8/5/2017 | 8/6/2017 | Site2

    Yes, you have the same event, but it's at two different sites each time, so those aren't really duplicates.
    Now, if your sample data looked like this:

    ScheduleID | Location | StartDate | EndDate | Site
    1 LocationA | 8/3/2017 | 8/4/2017 | Site1
    2 LocationA | 8/3/2017 | 8/4/2017 | Site1
    3 LocationA | 8/5/2017 | 8/6/2017 | Site2
    4 LocationA | 8/5/2017 | 8/6/2017 | Site2

    You'd have duplicates (the same event, twice in the same time period and the same location.)

    So, please, post an example of what you expect / want the output to look like, because my first thought was you have no duplicates.

  • Thank you all for taking the time to answer this.

    This is what ended doing:


    SELECT Location,startdate,enddate,
    SiteList = (SELECT ',' + Site  FROM tblsiteSchedule group by site
        FOR XML PATH('') )
    FROM tblSchedule GROUP by Location,startdate,enddate

    Thanks

  • natividad.castro - Wednesday, August 9, 2017 2:13 PM

    Thank you all for taking the time to answer this.

    This is what ended doing:


    SELECT Location,startdate,enddate,
    SiteList = (SELECT ',' + Site  FROM tblsiteSchedule group by site
        FOR XML PATH('') )
    FROM tblSchedule GROUP by Location,startdate,enddate

    Thanks

    Can't wait for you to have a site with an ampersand in it. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply