September 11, 2013 at 7:47 am
Hi All,
I used distinct in my query but then also I am getting almost 100K of duplicate records, below is the detail of count.
Expected count : 1769894
From select query I am getting : 13883717
The contact_ID should be unique and we shouldnt have duplicate contact ID, I used 4-5 tables to join.
Below is the query,
drop table CnI_Test;
create table CnI_Test as
select distinct
ml.account_id, ml.account_name, ml.NAMED_ACCOUNT_IND, ml.NAMED_ACCOUNT_DESCRIPTION, ml.site_id, ml.org_name,
ml.contact_id , ml.FIRST_NAME, ml.LAST_NAME,
ML.JOB_AREA_CD, ML.JOB_AREA_CD_desc,ML.job_level_cd, ml.job_level_cd_desc,
ML.MKT_REGION_NAME, ML.OVERALL_SALES_REGION, ML.SALES_REGION, ML.ISO_COUNTRY_CD_DESC, ML.TERRITORY,ml.CITY,ML.STATE_PROV_COUNTY,
ml.MKTG_CAPABLE_FLAG, ML.MKTG_CAPABLE_EMAIL_FLAG, ml.MKTG_CAPABLE_PHONE_FLAG, ml.MKTG_CAPABLE_MAIL_FLAG, ML.LAST_DT_CONTACT_ACTIVITY,
ML.JOB_TITLE,ml.AI_DATE_ADDED,
CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.PHONE_COUNTRY_CD ELSE NULL END AS PHONE_COUNTRY_CD,
CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.CONTACT_PHONE_NUMBER ELSE NULL END AS CONTACT_PHONE_NUMBER,
CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.CONTACT_MOBILE_PHONE ELSE NULL END AS CONTACT_MOBILE_PHONE,
CASE WHEN ML.MKTG_CAPABLE_PHONE_FLAG = 'Y' THEN ML.ORG_PHONE_NUMBER ELSE NULL END AS ORG_PHONE_NUMBER,
CASE WHEN ML.MKTG_CAPABLE_EMAIL_FLAG = 'Y' THEN ML.EMAIL_ADDRESS ELSE NULL END AS EMAIL_ADDRESS,
CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_1 ELSE NULL END AS ADDRESS_LINE_1,
CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_2 ELSE NULL END AS ADDRESS_LINE_2,
CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_3 ELSE NULL END AS ADDRESS_LINE_3,
CASE WHEN ML.MKTG_CAPABLE_MAIL_FLAG = 'Y' THEN ML.ADDRESS_LINE_4 ELSE NULL END AS ADDRESS_LINE_4,
no_of_csu_job_areas.job_area as csu_job_areas,
Owner_1.csu as CSU_DRIVER,
SUBSTR(Owner_1.own,1,1) as CSU_Owner,
SUBSTR(Owner_1.own,2,1) as CSU_Interest,
Owner_2.NCV_DRIVER as NCV_DRIVER,
SUBSTR(Owner_2.own,1,1) as NCV_Owner,
SUBSTR(Owner_2.own,2,1) as NCV_Interest,
NVL(ep.ACCOUNT_CURRENCY, 'Unmatched') AS ACCOUNT_SEGMENT ,
NVL(g.now_in_gov,'No') AS now_in_gov ,
NVL(g.release_from_gov_date,'N/A') AS release_from_gov_date,
NVL(g.days_to_release_from_gov,'N/A') AS days_to_release_from_gov,
sa.sales_req_not_to_call_account
from M_LIST ML
left join MRTROLLUP.CSU_JOB_AREAS no_of_csu_job_areas on (no_of_csu_job_areas.contact_id = ml.contact_id )
LEFT JOIN mrtcustomer.ent_parent ep ON ep.ACCOUNT_ID = ml.ACCOUNT_ID
LEFT JOIN (SELECT DISTINCT upper(trim(email_address)) AS email_address FROM REPOR01.BLOCKED_EMAIL WHERE email_address IS NOT NULL) blocked ON blocked.email_address = upper(trim(ml.email_address))
left join REPOR01.MART_CSU_CONTACT Owner_1 on Owner_1.contact_id = ml.contact_id
left join REPOR01.MART_NCV_CONTACT Owner_2 on Owner_2.contact_id = ml.contact_id
left join (select distinct email_address,
case when segmentation_last_run_date >= current_date - 15 then 'Yes' else 'No' end as now_in_gov,
case when segmentation_last_run_date >= current_date - 15 then to_char(segmentation_last_run_date + 15) else 'N/A' end as release_from_gov_date,
case when segmentation_last_run_date >= current_date - 15
then to_char(15 - (to_date(to_char(current_date,'YYYYMMDD'),'YYYYMMDD') - to_date(to_char(segmentation_last_run_date,'YYYYMMDD'),'YYYYMMDD')))
else 'N/A' end as days_to_release_from_gov
from
(select trim(lower(email_address)) as email_address, max(segmentation_last_run_date) as segmentation_last_run_date
from mrtaprimoadm.ca_main_comms_log
where email_address is not null
group by trim(lower(email_address)))
) g on g.email_address = trim(lower(ml.email_address))
left join sfdcdata.sfdc_accounts sa ON sa.enterprise_id = ml.account_id
left join sfdcdata.sfdc_accounts sa2 on ml.sfdc_account_id = sa2.sfdc_account_id
;
Can anyone please tell me how can I get expected 1769894 count.
M_List table is having 1769894 contacts.
Thanks,
Abhi
September 11, 2013 at 8:33 am
DISTINCT means to get rows that are unique across the entire row, not just one column. You have "duplicates" but if you look close you will find that at least one column across all those columns have different values.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 11, 2013 at 8:49 am
jaiswalabhishek22 (9/11/2013)
...Can anyone please tell me how can I get expected 1769894 count...
Begin with the query below. Uncomment a single table source at a time, and run through all of them. It's likely that more than one table source is returning more than one row per row of M_LIST.
select COUNT(*)
from M_LIST ML
--left join MRTROLLUP.CSU_JOB_AREAS no_of_csu_job_areas on (no_of_csu_job_areas.contact_id = ml.contact_id )
--LEFT JOIN mrtcustomer.ent_parent ep ON ep.ACCOUNT_ID = ml.ACCOUNT_ID
/*
LEFT JOIN (
SELECT DISTINCT upper(trim(email_address)) AS email_address
FROM REPOR01.BLOCKED_EMAIL
WHERE email_address IS NOT NULL
) blocked ON blocked.email_address = upper(trim(ml.email_address))
*/
--left join REPOR01.MART_CSU_CONTACT Owner_1 on Owner_1.contact_id = ml.contact_id
--left join REPOR01.MART_NCV_CONTACT Owner_2 on Owner_2.contact_id = ml.contact_id
/*
left join (
select distinct email_address,
case when segmentation_last_run_date >= current_date - 15 then 'Yes' else 'No' end as now_in_gov,
case when segmentation_last_run_date >= current_date - 15 then to_char(segmentation_last_run_date + 15) else 'N/A' end as release_from_gov_date,
case when segmentation_last_run_date >= current_date - 15
then to_char(15 - (to_date(to_char(current_date,'YYYYMMDD'),'YYYYMMDD') - to_date(to_char(segmentation_last_run_date,'YYYYMMDD'),'YYYYMMDD')))
else 'N/A' end as days_to_release_from_gov
from (
select
trim(lower(email_address)) as email_address,
max(segmentation_last_run_date) as segmentation_last_run_date
from mrtaprimoadm.ca_main_comms_log
where email_address is not null
group by trim(lower(email_address))
)
) g on g.email_address = trim(lower(ml.email_address))
*/
--left join sfdcdata.sfdc_accounts sa ON sa.enterprise_id = ml.account_id
--left join sfdcdata.sfdc_accounts sa2 on ml.sfdc_account_id = sa2.sfdc_account_id
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
September 11, 2013 at 8:57 am
I see from the "create table as" and to "to_char" that you're using Oracle. Sean is exactly right and DISTINCT works essentially the same in Oracle as in SQL Server.
I don't know what version you're using, but here's a link to the 10g docs on DISTINCT. http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#sthref9766
September 11, 2013 at 10:05 am
I need data from all different table with the condition match in join.
September 11, 2013 at 10:08 am
jaiswalabhishek22 (9/11/2013)
I need data from all different table with the condition match in join.
And when those values differ they are no longer duplicates. That is what distinct means.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 11, 2013 at 11:36 pm
AS Sean said DISTINCT doesn't means that if you get distinct values in some columns then it will be distinct for the rows...
can you please provide a sample result set from your output so that we can tell you why your getting duplicate rows....
_______________________________________________________________
To get quick answer follow this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply