Getting duplicate values evenif used distinct

  • Hi All,

    I used distinct in my query but then also I am getting almost 100K of duplicate records, below is the detail of count.

    Expected count : 1769894

    From select query I am getting : 13883717

    The contact_ID should be unique and we shouldnt have duplicate contact ID, I used 4-5 tables to join.

    Below is the query,

    drop table CnI_Test;

    create table CnI_Test as

    select distinct

    ml.account_id, ml.account_name, ml.NAMED_ACCOUNT_IND, ml.NAMED_ACCOUNT_DESCRIPTION, ml.site_id, ml.org_name,

    ml.contact_id , ml.FIRST_NAME, ml.LAST_NAME,

    ML.JOB_AREA_CD, ML.JOB_AREA_CD_desc,ML.job_level_cd, ml.job_level_cd_desc,

    ML.MKT_REGION_NAME, ML.OVERALL_SALES_REGION, ML.SALES_REGION, ML.ISO_COUNTRY_CD_DESC, ML.TERRITORY,ml.CITY,ML.STATE_PROV_COUNTY,

    ml.MKTG_CAPABLE_FLAG, ML.MKTG_CAPABLE_EMAIL_FLAG, ml.MKTG_CAPABLE_PHONE_FLAG, ml.MKTG_CAPABLE_MAIL_FLAG, ML.LAST_DT_CONTACT_ACTIVITY,

    ML.JOB_TITLE,ml.AI_DATE_ADDED,

    CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.PHONE_COUNTRY_CD ELSE NULL END AS PHONE_COUNTRY_CD,

    CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.CONTACT_PHONE_NUMBER ELSE NULL END AS CONTACT_PHONE_NUMBER,

    CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.CONTACT_MOBILE_PHONE ELSE NULL END AS CONTACT_MOBILE_PHONE,

    CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.ORG_PHONE_NUMBER ELSE NULL END AS ORG_PHONE_NUMBER,

    CASE WHEN ML.MKTG_CAPABLE_EMAIL_FLAG = 'Y' THEN ML.EMAIL_ADDRESS ELSE NULL END AS EMAIL_ADDRESS,

    CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_1 ELSE NULL END AS ADDRESS_LINE_1,

    CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_2 ELSE NULL END AS ADDRESS_LINE_2,

    CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_3 ELSE NULL END AS ADDRESS_LINE_3,

    CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_4 ELSE NULL END AS ADDRESS_LINE_4,

    no_of_csu_job_areas.job_area as csu_job_areas,

    Owner_1.csu as CSU_DRIVER,

    SUBSTR(Owner_1.own,1,1) as CSU_Owner,

    SUBSTR(Owner_1.own,2,1) as CSU_Interest,

    Owner_2.NCV_DRIVER as NCV_DRIVER,

    SUBSTR(Owner_2.own,1,1) as NCV_Owner,

    SUBSTR(Owner_2.own,2,1) as NCV_Interest,

    NVL(ep.ACCOUNT_CURRENCY, 'Unmatched') AS ACCOUNT_SEGMENT ,

    NVL(g.now_in_gov,'No') AS now_in_gov ,

    NVL(g.release_from_gov_date,'N/A') AS release_from_gov_date,

    NVL(g.days_to_release_from_gov,'N/A') AS days_to_release_from_gov,

    sa.sales_req_not_to_call_account

    from M_LIST ML

    left join MRTROLLUP.CSU_JOB_AREAS no_of_csu_job_areas on (no_of_csu_job_areas.contact_id = ml.contact_id )

    LEFT JOIN mrtcustomer.ent_parent ep ON ep.ACCOUNT_ID = ml.ACCOUNT_ID

    LEFT JOIN (SELECT DISTINCT upper(trim(email_address)) AS email_address FROM REPOR01.BLOCKED_EMAIL WHERE email_address IS NOT NULL) blocked ON blocked.email_address = upper(trim(ml.email_address))

    left join REPOR01.MART_CSU_CONTACT Owner_1 on Owner_1.contact_id = ml.contact_id

    left join REPOR01.MART_NCV_CONTACT Owner_2 on Owner_2.contact_id = ml.contact_id

    left join (select distinct email_address,

    case when segmentation_last_run_date >= current_date - 15 then 'Yes' else 'No' end as now_in_gov,

    case when segmentation_last_run_date >= current_date - 15 then to_char(segmentation_last_run_date + 15) else 'N/A' end as release_from_gov_date,

    case when segmentation_last_run_date >= current_date - 15

    then to_char(15 - (to_date(to_char(current_date,'YYYYMMDD'),'YYYYMMDD') - to_date(to_char(segmentation_last_run_date,'YYYYMMDD'),'YYYYMMDD')))

    else 'N/A' end as days_to_release_from_gov

    from

    (select trim(lower(email_address)) as email_address, max(segmentation_last_run_date) as segmentation_last_run_date

    from mrtaprimoadm.ca_main_comms_log

    where email_address is not null

    group by trim(lower(email_address)))

    ) g on g.email_address = trim(lower(ml.email_address))

    left join sfdcdata.sfdc_accounts sa ON sa.enterprise_id = ml.account_id

    left join sfdcdata.sfdc_accounts sa2 on ml.sfdc_account_id = sa2.sfdc_account_id

    ;

    Can anyone please tell me how can I get expected 1769894 count.

    M_List table is having 1769894 contacts.

    Thanks,

    Abhi

  • DISTINCT means to get rows that are unique across the entire row, not just one column. You have "duplicates" but if you look close you will find that at least one column across all those columns have different values.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • jaiswalabhishek22 (9/11/2013)


    ...Can anyone please tell me how can I get expected 1769894 count...

    Begin with the query below. Uncomment a single table source at a time, and run through all of them. It's likely that more than one table source is returning more than one row per row of M_LIST.

    select COUNT(*)

    from M_LIST ML

    --left join MRTROLLUP.CSU_JOB_AREAS no_of_csu_job_areas on (no_of_csu_job_areas.contact_id = ml.contact_id )

    --LEFT JOIN mrtcustomer.ent_parent ep ON ep.ACCOUNT_ID = ml.ACCOUNT_ID

    /*

    LEFT JOIN (

    SELECT DISTINCT upper(trim(email_address)) AS email_address

    FROM REPOR01.BLOCKED_EMAIL

    WHERE email_address IS NOT NULL

    ) blocked ON blocked.email_address = upper(trim(ml.email_address))

    */

    --left join REPOR01.MART_CSU_CONTACT Owner_1 on Owner_1.contact_id = ml.contact_id

    --left join REPOR01.MART_NCV_CONTACT Owner_2 on Owner_2.contact_id = ml.contact_id

    /*

    left join (

    select distinct email_address,

    case when segmentation_last_run_date >= current_date - 15 then 'Yes' else 'No' end as now_in_gov,

    case when segmentation_last_run_date >= current_date - 15 then to_char(segmentation_last_run_date + 15) else 'N/A' end as release_from_gov_date,

    case when segmentation_last_run_date >= current_date - 15

    then to_char(15 - (to_date(to_char(current_date,'YYYYMMDD'),'YYYYMMDD') - to_date(to_char(segmentation_last_run_date,'YYYYMMDD'),'YYYYMMDD')))

    else 'N/A' end as days_to_release_from_gov

    from (

    select

    trim(lower(email_address)) as email_address,

    max(segmentation_last_run_date) as segmentation_last_run_date

    from mrtaprimoadm.ca_main_comms_log

    where email_address is not null

    group by trim(lower(email_address))

    )

    ) g on g.email_address = trim(lower(ml.email_address))

    */

    --left join sfdcdata.sfdc_accounts sa ON sa.enterprise_id = ml.account_id

    --left join sfdcdata.sfdc_accounts sa2 on ml.sfdc_account_id = sa2.sfdc_account_id

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I see from the "create table as" and to "to_char" that you're using Oracle. Sean is exactly right and DISTINCT works essentially the same in Oracle as in SQL Server.

    I don't know what version you're using, but here's a link to the 10g docs on DISTINCT. http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9766

  • I need data from all different table with the condition match in join.

  • jaiswalabhishek22 (9/11/2013)


    I need data from all different table with the condition match in join.

    And when those values differ they are no longer duplicates. That is what distinct means.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • AS Sean said DISTINCT doesn't means that if you get distinct values in some columns then it will be distinct for the rows...

    can you please provide a sample result set from your output so that we can tell you why your getting duplicate rows....

    _______________________________________________________________
    To get quick answer follow this link:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply