Remove duplicates before inserting into destination

  • I have stage table with column"canpaignname"

    Values something like this :

    "onedaycampaign2016-08-20

    onedaycampaign2016-08-21

    onedaycampaign2016-08-22

    Todaycampaign2016-08-20

    Todaycampaign2016-08-20

    From the campaignname column I extracted " onedaycampaign" and "2016-08-22" ( date part) seperately into a reference table

    My destination table should have only "name " part from campaignname column and key attached to it

    Example :

    name. Key

    Onedaycampaign. 1

    Twodaycampaign. 2

    How do we do it? Please help.

    Right now I get duplicates in destination as my reference table has duplicates as j just extracted and palaced in reference table. I need to look up on reference table to insert into destination from stage .

  • I'd start with a SELECT DISTINCT on the staging table.

    How do your reference tables look like?

    Can you give some sample data?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • First, with 253 posts, you should know by now to post readily consumable data to get better answers quicker. Please read and heed the first link under "Helpful Links" in my signature line below for how to do that for all future posts to give your post the best chances for being answered quickly and with tested code.

    I'll do it once for you.

    --===== Simulates your source table

    SELECT d.SomeValue

    INTO #YourTable

    FROM (

    SELECT 'onedaycampaign2016-08-20' UNION ALL

    SELECT 'onedaycampaign2016-08-21' UNION ALL

    SELECT 'onedaycampaign2016-08-22' UNION ALL

    SELECT 'Todaycampaign2016-08-20' UNION ALL

    SELECT 'Todaycampaign2016-08-20' UNION ALL

    SELECT 'Todaycampaign2016-08-21' --Dupe campaign name but different date

    )

    ;

    Notice the last line of the test data above. In the name of bullet-proofing your code, what do you want to do when that eventuality occurs?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • My source has duplicate. Campaignnames appened with sendate dupes.

    So my example also contains dupes. In my reference tables , which comes from view over stage/source table has just campaign name column and date column derived from campaignname string.

    So when I try to load data from stage to destination using lookup on reference , I end up inserting dupes

  • komal145 (9/21/2016)


    My source has duplicate. Campaignnames appened with sendate dupes.

    So my example also contains dupes. In my reference tables , which comes from view over stage/source table has just campaign name column and date column derived from campaignname string.

    So when I try to load data from stage to destination using lookup on reference , I end up inserting dupes

    Understood but my question is a little different. Looking again at the commented line I have in the test data setup I wrote, can you ever have a duplicate Campaign Name with a different date than the other dupe(s)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes. Finally i am able to remove dupes by just getting distinct values from view and inserting into destination.

  • komal145 (9/21/2016)


    Yes. Finally i am able to remove dupes by just getting distinct values from view and inserting into destination.

    What did you do about the question I asked? Did you solve that, as well?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The campaignname with date appeneded is duplicates . so , yes to your question.

    example from my table:

    Day10ReminderCampaign2016_09_16_01:43:05

    Day10ReminderCampaign2016_09_17_01:44:01

    Day10ReminderCampaign2016_09_18_01:43:59

    Day10ReminderCampaign2016_09_19_01:45:42

    Day10ReminderCampaign2016_09_20_01:22:18

    Day10ReminderCampaign2016_09_16_01:43:05

    Day10ReminderCampaign2016_09_16_01:43:05

    Day3ReminderCampaign2016_09_16_02:24:16

    Day3ReminderCampaign2016_09_16_02:24:16

  • komal145 (9/21/2016)


    The campaignname with date appeneded is duplicates . so , yes to your question.

    example from my table:

    Day10ReminderCampaign2016_09_16_01:43:05

    Day10ReminderCampaign2016_09_17_01:44:01

    Day10ReminderCampaign2016_09_18_01:43:59

    Day10ReminderCampaign2016_09_19_01:45:42

    Day10ReminderCampaign2016_09_20_01:22:18

    Day10ReminderCampaign2016_09_16_01:43:05

    Day10ReminderCampaign2016_09_16_01:43:05

    Day3ReminderCampaign2016_09_16_02:24:16

    Day3ReminderCampaign2016_09_16_02:24:16

    So the eventual problem hasn't been solved. There's no guarantee that two identically named Campaigns won't occur with different dates.

    Anyhow... can you post the code you ended up with? It may help someone else in the future. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It would be good to also post the definition of the destination table. We could see clearly what it defines as a duplicate.

    ----------------------------------------------------

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply