Date logic

  • Hi,

    Below is the sample data to play with and i need to use the partition by to split the record based on hour difference.

    Declare @Sample table(TestDate datetime, Id int);

    insert into @Sample

    SELECT '2017-01-07 07:28:01.917' , 1

    UNION ALL

    SELECT '2017-01-07 07:28:05.210' , 1

    UNION ALL

    SELECT '2017-01-07 14:06:21.840' , 1

    UNION ALL

    SELECT '2017-01-07 14:06:24.227' , 1

    UNION ALL

    SELECT '2017-01-07 07:28:03.023' , 1

    UNION ALL

    SELECT '2017-01-07 14:06:22.440' , 1

    UNION ALL

    SELECT '2017-01-07 07:28:07.477' , 1

    UNION ALL

    SELECT '2017-01-07 14:06:28.180' , 1'

    TestDate RN

    2017-01-07 07:28:01.917 1

    2017-01-07 07:28:05.210 1

    2017-01-07 14:06:21.840 2

    2017-01-07 14:06:24.227 2

    2017-01-07 07:28:03.023 1

    2017-01-07 14:06:22.440 2

    2017-01-07 07:28:07.477 1

    2017-01-07 14:06:28.180 2

    Any sample query to achieve this please

  • Maybe something like this:

    SELECT *,

    DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,

    DENSE_RANK() OVER(ORDER BY DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017')) RN

    FROM @Sample;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hi Luis,

    thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.

    select * from @Sample order by TestDate

    please consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.

    in my case below are the duplicates

    2017-01-07 14:06:21.840

    2017-01-07 14:06:22.440

    2017-01-07 14:06:24.227

    2017-01-07 14:06:28.180

    how do i get this duplicate records based on same hour data. any sugestion please

    Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well

  • What´s the problem with the code I posted?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • as like my sample i have 100*1000 records in my table and dense rank puts incremental number. basically i need the query to delete the duplicate records. any help please

  • I'm lost on your requirements.

  • Sure, i will explain again.

    I have a table and right now i have duplicates on the table. Only way i can clean the duplicates based on id, RecordInsertedDateTime.

    The most of the records gets created based with in 30 minutes. so i need to find the duplicate records based on created time intraval.

    Sample code:

    DECLARE @Sample TABLE (

    TestDate DATETIME

    ,Id INT

    );

    INSERT INTO @Sample

    SELECT '2017-01-07 07:28:01.917'

    ,1

    UNION ALL

    SELECT '2017-01-07 07:28:05.210'

    ,1

    UNION ALL

    SELECT '2017-01-07 14:06:21.840'

    ,1

    UNION ALL

    SELECT '2017-01-07 14:06:24.227'

    ,1

    UNION ALL

    SELECT '2017-01-07 07:28:03.023'

    ,1

    UNION ALL

    SELECT '2017-01-07 14:06:22.440'

    ,1

    UNION ALL

    SELECT '2017-01-07 07:28:07.477'

    ,1

    UNION ALL

    SELECT '2017-01-07 14:06:28.180'

    ,1

    UNION ALL

    SELECT '2017-01-07 16:06:22.440'

    ,2

    UNION ALL

    SELECT '2017-01-07 16:28:07.477'

    ,2

    UNION ALL

    SELECT '2017-01-07 01:06:28.180'

    ,2

    union all

    SELECT '2017-01-07 01:06:30.180', 2

    on my above sample the below records created with in few seconds. but i don't want to go with seconds, so i kept 1 hours as criteria. So the first records inserted on 07:28:01.917. So (07:28:01.917 + 1 hour ) is my criteria

    For Id = 1

    2017-01-07 07:28:01.917

    2017-01-07 07:28:05.210

    2017-01-07 07:28:07.477

    2017-01-07 07:28:03.023

    Duplicate Records:

    2017-01-07 14:06:22.440

    2017-01-07 14:06:28.180

    2017-01-07 14:06:21.840

    2017-01-07 14:06:24.227

    But if you see the above duplicated records got inserted from 14:06:22.440 for the same Id 1

    so need to delete these records.

    SELECT *,

    DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,

    DENSE_RANK() OVER(ORDER BY id, DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017')) RN

    FROM @Sample;

    if i use Luis solution the RN has to be 1 for non duplicate record and 2 for duplicate records for the same Id but i am getting incremented RN and am struggling on removing the duplicates.

    Is my requirement clear? Any help please

  • How and why are they duplicates? All I see is a DateTime value and an ID. There is no real logic to state why they are duplicated just because they are in a different hour of the same day. Would they be duplicates if the date were different?

    This looks like a case of missing information unless I am missing something.

  • Is this more what you're looking for? This will keep the first set of records within the earliest hour for each Id on one day.

    WITH TEMP_CTE AS(

    SELECT *,

    DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') PartitionUsed,

    DENSE_RANK() OVER(PARTITION BY id, DATEADD(day, DATEDIFF(day, '2017', TestDate), '2017') ORDER BY id, DATEADD(HH, DATEDIFF(HH, '2017', TestDate), '2017') ASC) RN

    FROM @Sample

    )

    DELETE FROM TEMP_CTE WHERE RN > 1

  • The problem is that the definition of "duplicate" here is not well defined. Specifically, you want to count times as duplicates if they fall within a specific threshold of time. For convenience sake, let's say an hour is the threshold.

    Say we having the following events and times

    event_id time

    1 12:00

    2 12:45

    3 13:30

    Clearly 1 and 2 fall within the threshold, so they should be duplicates, and 2 and 3 fall within the threshold, so they should be duplicates, but 1 and 3 fall outside of the threshold, so they should not be duplicates, but they are both duplicates of 2, so they should be duplicates.

    So the question is whether the duplication is defined based on the first instance or the last instance in the group. The answer to that question determines how we determine which times are even within each group. In one instance, we want to use a packing intervals approach, and in the other, we want to use a gaps and islands approach. We can't tell you which approach to use based on the data that you have given us.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • thanks everyone for your time on this post and i could manage in different way. I just took the first created record time and added needed intraval(newdate) and kept in another temp table. then wrote a query to compared the inserteddate <= newdate (from temp table). Am all set. thank you and appreciated.

  • Quick thought, you can probably simplify the calculation, something like this maybe?

    😎

    INSERT INTO @Sample

    SELECT '2017-01-07 07:28:01.917',1 UNION ALL

    SELECT '2017-01-07 07:28:05.210',1 UNION ALL

    SELECT '2017-01-07 14:06:21.840',1 UNION ALL

    SELECT '2017-01-07 14:06:24.227',1 UNION ALL

    SELECT '2017-01-07 07:28:03.023',1 UNION ALL

    SELECT '2017-01-07 14:06:22.440',1 UNION ALL

    SELECT '2017-01-07 07:28:07.477',1 UNION ALL

    SELECT '2017-01-07 14:06:28.180',1 UNION ALL

    SELECT '2017-01-07 16:06:22.440',2 UNION ALL

    SELECT '2017-01-07 16:28:07.477',2 UNION ALL

    SELECT '2017-01-07 01:06:28.180',2 UNION ALL

    SELECT '2017-01-07 01:06:30.180',2;

    SELECT

    S.Id

    ,S.TestDate

    ,DENSE_RANK() OVER (PARTITION BY S.Id ORDER BY FLOOR(DATEDIFF(MINUTE,0,S.TestDate) / 30.0) ASC) AS DRNK

    FROM @Sample S;

    Output

    Id TestDate DRNK

    --- ----------------------- -----

    1 2017-01-07 07:28:01.917 1

    1 2017-01-07 07:28:05.210 1

    1 2017-01-07 07:28:07.477 1

    1 2017-01-07 07:28:03.023 1

    1 2017-01-07 14:06:22.440 2

    1 2017-01-07 14:06:28.180 2

    1 2017-01-07 14:06:21.840 2

    1 2017-01-07 14:06:24.227 2

    2 2017-01-07 01:06:28.180 1

    2 2017-01-07 01:06:30.180 1

    2 2017-01-07 16:06:22.440 2

    2 2017-01-07 16:28:07.477 2

  • KGJ-Dev (1/9/2017)


    Hi Luis,

    thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.

    select * from @Sample order by TestDate

    please consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.

    in my case below are the duplicates

    2017-01-07 14:06:21.840

    2017-01-07 14:06:22.440

    2017-01-07 14:06:24.227

    2017-01-07 14:06:28.180

    how do i get this duplicate records based on same hour data. any sugestion please

    Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well

    SELECT 'unique records to leave', *, ROW_NUMBER() OVER (ORDER BY s.TestDate)

    FROM @Sample s

    WHERE NOT EXISTS (SELECT *

    FROM @Sample s2

    WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)

    AND s2.TestDate < s.TestDate

    )

    SELECT 'duplicates to remove', *

    FROM @Sample s

    WHERE EXISTS (SELECT *

    FROM @Sample s2

    WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)

    AND s2.TestDate < s.TestDate

    )

    _____________
    Code for TallyGenerator

  • Sergiy - Tuesday, January 10, 2017 6:29 PM

    KGJ-Dev (1/9/2017)


    Hi Luis, thanks for the reply and i would like to put my requirement in detail. please note i have updated my sample on the first post.select * from @Sample order by TestDateplease consider the first four records inserted with in same hour. if the testdate falls with in same hours then that's not duplicate. else duplicate.in my case below are the duplicates2017-01-07 14:06:21.8402017-01-07 14:06:22.4402017-01-07 14:06:24.2272017-01-07 14:06:28.180how do i get this duplicate records based on same hour data. any sugestion please Also there is a possiblility 2016-12-31 23.55 , 2016-12-31 23.56, 2017-01-01 00.55. these are not duplicate. We need to tacke this as well

    SELECT 'unique records to leave', *, ROW_NUMBER() OVER (ORDER BY s.TestDate)FROM @Sample sWHERE NOT EXISTS (SELECT * FROM @Sample s2 WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)AND s2.TestDate < s.TestDate )SELECT 'duplicates to remove', * FROM @Sample sWHERE EXISTS (SELECT * FROM @Sample s2 WHERE s2.TestDate > DATEADD(ss, -30, s.TestDate)AND s2.TestDate < s.TestDate )

    Thanks a lot guys for the different solutions. much appreciated.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply