Removing duplicate data

  • Hi,
    We areinitially storing weekly total hours worked and later changed the criteria tostore daily hours worked. when this change happened, the developer stored dailyhours and weekly totals for few months. I have to remove the rows where overlaphappened like the data shown below:

    EmplIDStartDate EndDate Hoursworked
    1 1/2/2017 1/2/2017 7
    1 1/3/2017 1/3/2017 7
    1 1/4/2017 1/4/2017 7
    1 1/5/2017 1/5/2017 7
    1 1/6/2017 1/6/2017 7
    1 1/1/2017 1/7/2017 35
    1 1/8/2017 1/8/2017 7
    1 1/9/2017 1/9/2017 7
    1 1/10/2017 1/10/2017 7
    1 1/11/2017 1/11/2017 7
    1 1/12/2017 1/12/2017 7
    1 1/13/2017 1/13/2017 7

    forthe week of 1/1 to 1/7, I have daily rows from Mon-Fri (1/2-1/6) and a row forthe week Sun-Sat(1/1-1/7) and the following week I have just daily rows. If Iuse  this qry ,

    SELECT emplid,stardt,enddt,hoursworked

    FROM

       Table1

    WHERE

    stardt<>enddt;

     

    Idon't get the second week rows,

     

     

    and If I use second qry,

     

    DELETE

       Table1

    WHERE

      stardt=enddt;

     

    I am going to delete all daily hours inaddition to problem rows.  I have to keepeither daily rows or weekly total row for weeks that have both values.
    Hope I explained the problem clearly.

  • You'll be more likely to get help if you can post your query in a form that makes it easier for prospective helpers to use.  Read article:

    Forum Etiquette: How to post data/code on a forum to get the best help

    With many requests for help, posting proper CREATE TABLE and INSERT statements, as well as ensuring you are consistent in your column names may avoid some discouragement.

    Relating to the question itself, a bit more information is needed to determine the simplest solution.
    1. Is there a specific date on which the daily records started for all employees? 
    2. Did the start of the daily record keeping coincide with the start of a week for weekly record keeping?  If it didn't, then you may need to discard the first few daily records to avoid double counting.

    Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>. You can find the IFCode tags under the INSERT options when you are writing a post.

  • you should also ensure that you provide your dates in a format that anyone can use. Your dates wouldn't work my end as there's not 13 months in the year.

    The data you've given us is very limited. There's only one employee and is this reflective of your true DDL (do you really not have an ID column)? We'd need a much bigger data set to really test this, and proper DDL.

    Until you provide such, this is one solution I've come up with. As you can see, however, this relies on your having an ID column:
    USE DevTestDB;
    GO

    CREATE TABLE #EmpHours (HoursID INT IDENTITY(1,1),
            EmpID INT,
            StartDate DATE,
            EndDate DATE,
            HoursWorked INT);
    GO

    INSERT INTO #EmpHours
    VALUES (1, '20170102', '20170102', 7),
       (1, '20170103', '20170103', 7),
       (1, '20170104', '20170104', 7),
       (1, '20170105', '20170105', 7),
       (1, '20170106', '20170106', 7),
       (1, '20170101', '20170107', 35),
       (1, '20170108', '20170108', 7),
       (1, '20170109', '20170109', 7),
       (1, '20170110', '20170110', 7),
       (1, '20170111', '20170111', 7),
       (1, '20170112', '20170112', 7),
       (1, '20170113', '20170113', 7);
    GO

    SELECT *
    FROM #EmpHours;

    WITH Weekly AS(
      SELECT *
      FROM #EmpHours EH
      WHERE EH.StartDate != EH.EndDate),
    Duplicates AS (
      SELECT EH.HoursID
      FROM #EmpHours EH
       JOIN Weekly W ON EH.EmpID = W.EmpID AND EH.StartDate BETWEEN W.StartDate AND W.EndDate
      WHERE EH.StartDate = EH.EndDate)
    DELETE FROM #EmpHours
    WHERE HoursID IN (SELECT sq.HoursID FROM Duplicates sq);

    SELECT *
    FROM #EmpHours;

    GO
    DROP TABLE #EmpHours;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Okay, here goes with some setup code. I've added in a couple of earlier records with just weekly data.

    CREATE TABLE [dbo].[HoursWorked](
        [EmpID] [INT] NOT NULL,
        [StartDate] [DATE] NOT NULL,
        [EndDate] [DATE] NOT NULL,
        [HoursWorked] [INT] NOT NULL
    ) ON [PRIMARY]

    GO

    INSERT dbo.HoursWorked
    (
        EmpID ,
        StartDate ,
        EndDate ,
        HoursWorked
    )
    VALUES 
        (1,'25-Dec-2016','31-Dec-2016',21),
        (1,'18-Dec-2016','24-Dec-2016',21),
        (1,'02-Jan-2017','02-Jan-2017',7),
        (1,'03-Jan-2017','03-Jan-2017',7),
        (1,'04-Jan-2017','04-Jan-2017',7),
        (1,'05-Jan-2017','05-Jan-2017',7),
        (1,'06-Jan-2017','06-Jan-2017',7),
        (1,'01-Jan-2017','07-Jan-2017',35),
        (1,'08-Jan-2017','08-Jan-2017',7),
        (1,'09-Jan-2017','09-Jan-2017',7),
        (1,'10-Jan-2017','10-Jan-2017',7),
        (1,'11-Jan-2017','11-Jan-2017',7),
        (1,'12-Jan-2017','12-Jan-2017',7),
        (1,'13-Jan-2017','13-Jan-2017',7);

    SELECT * FROM dbo.HoursWorked ORDER BY StartDate, EndDate;

    The following will remove daily records where weekly records exist.


    WITH weekly AS
    ( SELECT * FROM dbo.HoursWorked WHERE EndDate > StartDate )
    DELETE daily
    FROM dbo.HoursWorked daily
        JOIN weekly ON daily.EmpID = weekly.EmpID
                    AND daily.StartDate = daily.EndDate
                    AND daily.StartDate BETWEEN weekly.StartDate AND weekly.EndDate
                    AND daily.EndDate BETWEEN weekly.StartDate AND weekly.EndDate ;

    SELECT * FROM dbo.HoursWorked ORDER BY StartDate, EndDate;

    Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>. You can find the IFCode tags under the INSERT options when you are writing a post.

  • Thom A - Friday, January 27, 2017 2:31 AM

    you should also ensure that you provide your dates in a format that anyone can use. Your dates wouldn't work my end as there's not 13 months in the year.

    The data you've given us is very limited. There's only one employee and is this reflective of your true DDL (do you really not have an ID column)? We'd need a much bigger data set to really test this, and proper DDL.

    Until you provide such, this is one solution I've come up with. As you can see, however, this relies on your having an ID column:
    USE DevTestDB;
    GO

    CREATE TABLE #EmpHours (HoursID INT IDENTITY(1,1),
            EmpID INT,
            StartDate DATE,
            EndDate DATE,
            HoursWorked INT);
    GO

    INSERT INTO #EmpHours
    VALUES (1, '20170102', '20170102', 7),
       (1, '20170103', '20170103', 7),
       (1, '20170104', '20170104', 7),
       (1, '20170105', '20170105', 7),
       (1, '20170106', '20170106', 7),
       (1, '20170101', '20170107', 35),
       (1, '20170108', '20170108', 7),
       (1, '20170109', '20170109', 7),
       (1, '20170110', '20170110', 7),
       (1, '20170111', '20170111', 7),
       (1, '20170112', '20170112', 7),
       (1, '20170113', '20170113', 7);
    GO

    SELECT *
    FROM #EmpHours;

    WITH Weekly AS(
      SELECT *
      FROM #EmpHours EH
      WHERE EH.StartDate != EH.EndDate),
    Duplicates AS (
      SELECT EH.HoursID
      FROM #EmpHours EH
       JOIN Weekly W ON EH.EmpID = W.EmpID AND EH.StartDate BETWEEN W.StartDate AND W.EndDate
      WHERE EH.StartDate = EH.EndDate)
    DELETE FROM #EmpHours
    WHERE HoursID IN (SELECT sq.HoursID FROM Duplicates sq);

    SELECT *
    FROM #EmpHours;

    GO
    DROP TABLE #EmpHours;

    Thom A - Friday, January 27, 2017 2:31 AM

    you should also ensure that you provide your dates in a format that anyone can use. Your dates wouldn't work my end as there's not 13 months in the year.

    The data you've given us is very limited. There's only one employee and is this reflective of your true DDL (do you really not have an ID column)? We'd need a much bigger data set to really test this, and proper DDL.

    Until you provide such, this is one solution I've come up with. As you can see, however, this relies on your having an ID column:
    USE DevTestDB;
    GO

    CREATE TABLE #EmpHours (HoursID INT IDENTITY(1,1),
            EmpID INT,
            StartDate DATE,
            EndDate DATE,
            HoursWorked INT);
    GO

    INSERT INTO #EmpHours
    VALUES (1, '20170102', '20170102', 7),
       (1, '20170103', '20170103', 7),
       (1, '20170104', '20170104', 7),
       (1, '20170105', '20170105', 7),
       (1, '20170106', '20170106', 7),
       (1, '20170101', '20170107', 35),
       (1, '20170108', '20170108', 7),
       (1, '20170109', '20170109', 7),
       (1, '20170110', '20170110', 7),
       (1, '20170111', '20170111', 7),
       (1, '20170112', '20170112', 7),
       (1, '20170113', '20170113', 7);
    GO

    SELECT *
    FROM #EmpHours;

    WITH Weekly AS(
      SELECT *
      FROM #EmpHours EH
      WHERE EH.StartDate != EH.EndDate),
    Duplicates AS (
      SELECT EH.HoursID
      FROM #EmpHours EH
       JOIN Weekly W ON EH.EmpID = W.EmpID AND EH.StartDate BETWEEN W.StartDate AND W.EndDate
      WHERE EH.StartDate = EH.EndDate)
    DELETE FROM #EmpHours
    WHERE HoursID IN (SELECT sq.HoursID FROM Duplicates sq);

    SELECT *
    FROM #EmpHours;

    GO
    DROP TABLE #EmpHours;

    Thank you so much for the response Thom. Sorry, this was my first post and did not follow the guide lines specified in the post you mentioned. Thank you for the link and will make sure to follow it. I have tried Collin's solution and it worked. Thanks again for your help

  • colin.frame - Friday, January 27, 2017 2:40 AM

    Okay, here goes with some setup code. I've added in a couple of earlier records with just weekly data.

    CREATE TABLE [dbo].[HoursWorked](
        [EmpID] [INT] NOT NULL,
        [StartDate] [DATE] NOT NULL,
        [EndDate] [DATE] NOT NULL,
        [HoursWorked] [INT] NOT NULL
    ) ON [PRIMARY]

    GO

    INSERT dbo.HoursWorked
    (
        EmpID ,
        StartDate ,
        EndDate ,
        HoursWorked
    )
    VALUES 
        (1,'25-Dec-2016','31-Dec-2016',21),
        (1,'18-Dec-2016','24-Dec-2016',21),
        (1,'02-Jan-2017','02-Jan-2017',7),
        (1,'03-Jan-2017','03-Jan-2017',7),
        (1,'04-Jan-2017','04-Jan-2017',7),
        (1,'05-Jan-2017','05-Jan-2017',7),
        (1,'06-Jan-2017','06-Jan-2017',7),
        (1,'01-Jan-2017','07-Jan-2017',35),
        (1,'08-Jan-2017','08-Jan-2017',7),
        (1,'09-Jan-2017','09-Jan-2017',7),
        (1,'10-Jan-2017','10-Jan-2017',7),
        (1,'11-Jan-2017','11-Jan-2017',7),
        (1,'12-Jan-2017','12-Jan-2017',7),
        (1,'13-Jan-2017','13-Jan-2017',7);

    SELECT * FROM dbo.HoursWorked ORDER BY StartDate, EndDate;

    The following will remove daily records where weekly records exist.


    WITH weekly AS
    ( SELECT * FROM dbo.HoursWorked WHERE EndDate > StartDate )
    DELETE daily
    FROM dbo.HoursWorked daily
        JOIN weekly ON daily.EmpID = weekly.EmpID
                    AND daily.StartDate = daily.EndDate
                    AND daily.StartDate BETWEEN weekly.StartDate AND weekly.EndDate
                    AND daily.EndDate BETWEEN weekly.StartDate AND weekly.EndDate ;

    SELECT * FROM dbo.HoursWorked ORDER BY StartDate, EndDate;

    Thank you so much for the response Colin.    I am so sorry, this was my first post and did not follow the guide lines specified in the post you mentioned. Thank you for the link and will make sure to follow it. Your solution worked for me. Thanks a lot and I appreciate all your help

  • @Saig1417 - no problem, live and learn.  Thom's use of a temporary table for test data is better practice than mine, and Jeff's How to Post Guide (linked by Thom A and me separately) also includes a useful IF OBJECTID(...) NOT NULL statement so the code can be re-run.

    Forum Etiquette: How to post data/code on a forum to get the best helpMake sure that you include code in the appropriate IFCode tags, e.g. <your code here>. You can find the IFCode tags under the INSERT options when you are writing a post.

  • colin.frame - Friday, January 27, 2017 7:16 AM

    @Saig1417 - no problem, live and learn.  Thom's use of a temporary table for test data is better practice than mine, and Jeff's How to Post Guide (linked by Thom A and me separately) also includes a useful IF OBJECTID(...) NOT NULL statement so the code can be re-run.

    Sure! I will follow your both suggestions. Thanks again!

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply