removing duplicates

  • Hello,

    I appeal for help on a sql method to eliminate values ​​considered as "duplicate" on each input rows in a table.

    In summary here is the schematics:

    A table that collects data "Service" 40 orders (services) possible by invoice

    N°Invoice - Service1 - Service2 - Service3 - Service4- Service5 - up to 40 .... Etc...

    000001 Null VAA VAA SFG SFG

    000002 YBB YBB SFU Null Null

    000003 Null Null SAE SFU SFU

    000004 VDP VDP VBB VBB VAA

    etc ....

    The goal is to get this following table B

    N°invoice - Service1 - Service2 - Service3 - Service4 - Service5 ........ etc ..

    000001 VAA Null Null Null SFG

    000002 Null Null Null YBB SFU

    000003 Null Null Null SAE SFU

    000004 VDP VBB Null Null VAA

    Invoices must be addressed once to the service because we just need one validation by services for all no matter if it is mentioned 2, 5 or 10 times on the invoice.

    The brutal solution that I found : (ex: to test the update service3, we have to look in a temp table the 2 previous values):

    with temp as (select N°facture,service1 as serv1, service2 as serv2) from table A

    insert into table B (N°facture,Service3)

    select a.N°Invoice, a.Service3 as serv3

    from TableA a

    Left Join temp

    on temp.N°Invoice=a.N°facture

    Where serv1<>a.service3 And serv2<>a.service3

    The problem with this method is that you test the 40 services one by one .. that becomes undrinkable writing from the 6th or 7th already ...:blink:

    I think there must be a faster method (rownum..)

  • In a word, Normalize your table. Then this is stupid easy. Repeating fields are going to make your life miserable.

  • As previously stated, you need to normalize your table. If you can't change the schema, you still need to code the normalization and denormalize again. Here's a sample on how to do it. It will be easier to complete it than your previous option.

    CREATE TABLE #Sample(

    Invoice char(6),

    Service1 char(6), Service2 char(6), Service3 char(6), Service4 char(6), Service5 char(6))

    INSERT INTO #Sample

    VALUES

    ('000001', Null, 'VAA', 'VAA', 'SFG', 'SFG'),

    ('000002', 'YBB', 'YBB', 'SFU', Null, Null),

    ('000003', Null, Null, 'SAE', 'SFU', 'SFU'),

    ('000004', 'VDP', 'VDP', 'VBB', 'VBB', 'VAA');

    WITH CTE AS(

    SELECT Invoice,

    MIN(ServiceID) ServiceID,

    Value

    FROM #Sample

    CROSS APPLY (SELECT 1, Service1 UNION ALL

    SELECT 2, Service2 UNION ALL

    SELECT 3, Service3 UNION ALL

    SELECT 4, Service4 UNION ALL

    SELECT 5, Service5)x(ServiceID,Value)

    GROUP BY Invoice, Value

    )

    SELECT Invoice,

    MAX( CASE WHEN ServiceID = 1 THEN Value END) Service1,

    MAX( CASE WHEN ServiceID = 2 THEN Value END) Service2,

    MAX( CASE WHEN ServiceID = 3 THEN Value END) Service3,

    MAX( CASE WHEN ServiceID = 4 THEN Value END) Service4,

    MAX( CASE WHEN ServiceID = 5 THEN Value END) Service5

    FROM CTE

    GROUP BY Invoice;

    GO

    DROP TABLE #Sample

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You can prevent duplication of service codes by leveraging an INSTEAD OF INSERT, UPDATE trigger where each service column is conditionally populated depending on whether a preceeding service column has the same code. For example:

    CREATE TABLE MyInvoice

    (

    Invoice char(6) not null primary key

    , Service1 char(6) null

    , Service2 char(6) null

    , Service3 char(6) null

    , Service4 char(6) null

    , Service5 char(6) null

    );

    GO

    CREATE TRIGGER MyInvoiceTrigger on MyInvoice

    INSTEAD OF INSERT

    AS

    BEGIN

    INSERT INTO MyInvoice SELECT

    Invoice

    , Service1

    , case when Service2

    not in (isnull(Service1,''))

    then Service2 else null end

    , case when Service3

    not in (isnull(Service1,''),isnull(Service2,''))

    then Service3 else null end

    , case when Service4

    not in (isnull(Service1,''),isnull(Service2,''),isnull(Service3,''))

    then Service4 else null end

    , case when Service5

    not in (isnull(Service1,''),isnull(Service2,''),isnull(Service3,''),isnull(Service4,''))

    then Service5 else null end

    FROM inserted;

    END;

    GO

    insert into MyInvoice values ('001','A','B',null,null,null);

    insert into MyInvoice values ('002','A','B','A','C',null);

    insert into MyInvoice values ('003','A','B',null,'C','A');

    select * from MyInvoice;

    Invoice Service1 Service2 Service3 Service4 Service5

    001 A B NULL NULL NULL

    002 A B NULL C NULL

    003 A B NULL C NULL

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Many thanks for your reply.

    And yes it s not possible to normalize as the data come from pdf captured areas from invoices.

    Each invoices could have 40 services it s depend how many purchase order are booked on it.

    And the ’null’ result correspond to a normal situation .

    The service code show a discrepencie between the PO and the invoice.

    So the manager need to received it and approved . So if he had 10 discrepencies he just need to received the document once.

    That why i need to remouving duplicates by invoices...

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply