July 4, 2017 at 3:32 am
Hello all,
need some support please
Have an SQL table with some hundred million lines and about 20 columns.
Not all columns are needed for the query.
Use this script:
With MYCTE as (
SELECT DuplicateKey1
, DuplicateKey2 --optional
, Count (*) X
FROM MyTable
Group by DuplicateKey1, DuplicateKey2
Having count (*)> 1
)
SELECT E. *
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1 = cte.DuplicateKey1
AND E.DuplicateKey2 = cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt
However, I have the following conditions:
It is a log file
Terminals, date, time, amount, transaction_ID, tracenumber, and customer group
Now there are cases where
The same amount was billed twice per terminal on the same date / time.
The time may differ by a few seconds.
Example:
Terminal Date Time Amount Transaction_ID Customer group
123 01.05.2017 12:00:30 10.00 00123 Business
123 01.05.2017 12:00:35 10.00 00124 Business ----> Double entry
July 4, 2017 at 4:49 am
Based on the very limited data we have. One idea:CREATE TABLE #Sample
(terminal int,
[Date] date,
[time] time,
Amount decimal(8,2),
Transaction_ID varchar(6),
Customer_group varchar(20));
GO
INSERT INTO #Sample
VALUES
(123,'20170501','12:00:30',10.00,'000123','Business'),
(123,'20170501','12:00:35',10.00,'000124','Business'),
--More sample data would be nice
(125,'20170501','12:05:52',100.00,'000125','Business'),
(125,'20170501','12:05:58',120.00,'000126','Business'), --Not a duplicate
(124,'20170501','12:10:59',12.00,'000127','Business'),
(124,'20170501','12:11:10',12.00,'000128','Business'); --Another Duplciate
GO
SELECT *
FROM #Sample;
GO
WITH Datetimes AS (
--Format the data to a date time data type and add the date and time together
SELECT terminal, CAST([Date] AS datetime) + CAST([Time] AS datetime) AS Date_Time,
CAST([Date] AS datetime) AS [Date], [Time],
Amount, Transaction_ID, Customer_group
FROM #Sample),
Ranges AS (
-Set some max/min ranges. you can change this, however, have set it to +/- 1 minute.
SELECT *,
DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time), [Date]) As LowerCheck,
DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time) +1, [Date]) As UpperCheck
FROM Datetimes)
--Look for duplicates
SELECT R.terminal, R.Amount, R.Date_Time, R.Transaction_ID, R.LowerCheck, R.UpperCheck
FROM Ranges R
WHERE (SELECT COUNT(*)
FROM Ranges sq
WHERE sq.Amount = R.Amount
AND sq.terminal = R.terminal
AND (sq.UpperCheck BETWEEN R.LowerCheck AND R.UpperCheck
OR sq.LowerCheck BETWEEN R.LowerCheck AND R.UpperCheck)) > 1;
GO
DROP TABLE #Sample;
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
July 4, 2017 at 5:19 am
Thom A - Tuesday, July 4, 2017 4:49 AMBased on the very limited data we have. One idea:CREATE TABLE #Sample
(terminal int,
[Date] date,
[time] time,
Amount decimal(8,2),
Transaction_ID varchar(6),
Customer_group varchar(20));
GOINSERT INTO #Sample
VALUES
(123,'20170501','12:00:30',10.00,'000123','Business'),
(123,'20170501','12:00:35',10.00,'000124','Business'),
--More sample data would be nice
(125,'20170501','12:05:52',100.00,'000125','Business'),
(125,'20170501','12:05:58',120.00,'000126','Business'), --Not a duplicate
(124,'20170501','12:10:59',12.00,'000127','Business'),
(124,'20170501','12:11:10',12.00,'000128','Business'); --Another DuplciateGO
SELECT *
FROM #Sample;
GOWITH Datetimes AS (
--Format the data to a date time data type and add the date and time together
SELECT terminal, CAST([Date] AS datetime) + CAST([Time] AS datetime) AS Date_Time,
CAST([Date] AS datetime) AS [Date], [Time],
Amount, Transaction_ID, Customer_group
FROM #Sample),
Ranges AS (
-Set some max/min ranges. you can change this, however, have set it to +/- 1 minute.
SELECT *,
DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time), [Date]) As LowerCheck,
DATEADD(MINUTE, DATEDIFF(MINUTE, Date, Date_Time) +1, [Date]) As UpperCheck
FROM Datetimes)
--Look for duplicates
SELECT R.terminal, R.Amount, R.Date_Time, R.Transaction_ID, R.LowerCheck, R.UpperCheck
FROM Ranges R
WHERE (SELECT COUNT(*)
FROM Ranges sq
WHERE sq.Amount = R.Amount
AND sq.terminal = R.terminal
AND (sq.UpperCheck BETWEEN R.LowerCheck AND R.UpperCheck
OR sq.LowerCheck BETWEEN R.LowerCheck AND R.UpperCheck)) > 1;GO
DROP TABLE #Sample;
Hello
Thank you will try
Thank you for the effort
Greetings from Cologne
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply