Please need support for Query (Sql Server 2012) Find double values

  • Hello all,

    need some support please

    Have an SQL table with some hundred million lines and about 20 columns.
    Not all columns are needed for the query.

    Use this script:

    With MYCTE as (
      SELECT DuplicateKey1
            , DuplicateKey2 --optional
            , Count (*) X
      FROM MyTable
      Group by DuplicateKey1, DuplicateKey2
      Having count (*)> 1
    )
    SELECT E. *
    FROM MyTable E
    JOIN MYCTE cte
    ON E.DuplicateKey1 = cte.DuplicateKey1
       AND E.DuplicateKey2 = cte.DuplicateKey2
    ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt

    However, I have the following conditions:

    It is a log file

    Terminals, date, time, amount, transaction_ID, tracenumber, and customer group

    Now there are cases where

    The same amount was billed twice per terminal on the same date / time.
    The time may differ by a few seconds.

    Example:

    Terminal    Date                   Time         Amount     Transaction_ID    Customer group
     123           01.05.2017      12:00:30      10.00          00123                          Business
     123           01.05.2017      12:00:35      10.00          00124                          Business
    ----> Double entry

  • Based on the very limited data we have. One idea:
    CREATE TABLE #Sample
      (terminal int,
      [Date] date,
      [time] time,
      Amount decimal(8,2),
      Transaction_ID varchar(6),
      Customer_group varchar(20));
    GO

    INSERT INTO #Sample
    VALUES
      (123,'20170501','12:00:30',10.00,'000123','Business'),
      (123,'20170501','12:00:35',10.00,'000124','Business'),
      --More sample data would be nice
      (125,'20170501','12:05:52',100.00,'000125','Business'),
      (125,'20170501','12:05:58',120.00,'000126','Business'), --Not a duplicate
      (124,'20170501','12:10:59',12.00,'000127','Business'),
      (124,'20170501','12:11:10',12.00,'000128','Business'); --Another Duplciate

    GO

    SELECT *
    FROM #Sample;
    GO

    WITH Datetimes AS (
      --Format the data to a date time data type and add the date and time together
      SELECT terminal, CAST([Date] AS datetime) + CAST([Time] AS datetime) AS Date_Time,
         CAST([Date] AS datetime) AS [Date], [Time],
        Amount, Transaction_ID, Customer_group
      FROM #Sample),
    Ranges AS (
       -Set some max/min ranges. you can change this, however, have set it to +/- 1 minute.
      SELECT *,
        DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time), [Date]) As LowerCheck,
        DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time) +1, [Date]) As UpperCheck
      FROM Datetimes)
    --Look for duplicates
    SELECT R.terminal, R.Amount, R.Date_Time, R.Transaction_ID, R.LowerCheck, R.UpperCheck
    FROM Ranges R
    WHERE (SELECT COUNT(*)
       FROM Ranges sq
       WHERE sq.Amount = R.Amount
       AND sq.terminal = R.terminal
       AND (sq.UpperCheck BETWEEN R.LowerCheck AND R.UpperCheck
        OR sq.LowerCheck BETWEEN R.LowerCheck AND R.UpperCheck)) > 1;

    GO
    DROP TABLE #Sample;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Tuesday, July 4, 2017 4:49 AM

    Based on the very limited data we have. One idea:
    CREATE TABLE #Sample
      (terminal int,
      [Date] date,
      [time] time,
      Amount decimal(8,2),
      Transaction_ID varchar(6),
      Customer_group varchar(20));
    GO

    INSERT INTO #Sample
    VALUES
      (123,'20170501','12:00:30',10.00,'000123','Business'),
      (123,'20170501','12:00:35',10.00,'000124','Business'),
      --More sample data would be nice
      (125,'20170501','12:05:52',100.00,'000125','Business'),
      (125,'20170501','12:05:58',120.00,'000126','Business'), --Not a duplicate
      (124,'20170501','12:10:59',12.00,'000127','Business'),
      (124,'20170501','12:11:10',12.00,'000128','Business'); --Another Duplciate

    GO

    SELECT *
    FROM #Sample;
    GO

    WITH Datetimes AS (
      --Format the data to a date time data type and add the date and time together
      SELECT terminal, CAST([Date] AS datetime) + CAST([Time] AS datetime) AS Date_Time,
         CAST([Date] AS datetime) AS [Date], [Time],
        Amount, Transaction_ID, Customer_group
      FROM #Sample),
    Ranges AS (
       -Set some max/min ranges. you can change this, however, have set it to +/- 1 minute.
      SELECT *,
        DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time), [Date]) As LowerCheck,
        DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time) +1, [Date]) As UpperCheck
      FROM Datetimes)
    --Look for duplicates
    SELECT R.terminal, R.Amount, R.Date_Time, R.Transaction_ID, R.LowerCheck, R.UpperCheck
    FROM Ranges R
    WHERE (SELECT COUNT(*)
       FROM Ranges sq
       WHERE sq.Amount = R.Amount
       AND sq.terminal = R.terminal
       AND (sq.UpperCheck BETWEEN R.LowerCheck AND R.UpperCheck
        OR sq.LowerCheck BETWEEN R.LowerCheck AND R.UpperCheck)) > 1;

    GO
    DROP TABLE #Sample;

    Hello
    Thank you will try

    Thank you for the effort

    Greetings from Cologne

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply