SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Most efficient way of not selecting data


Most efficient way of not selecting data

Author
Message
webskater
webskater
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3114 Visits: 524

CREATE TABLE #tblContacts
(
ContactID int,
Contact varchar(100)
)

GO

INSERT INTO #tblContacts (ContactID, Contact)
SELECT 1, 'Fred' UNION ALL
SELECT 2, 'Jim' UNION ALL
SELECT 3, 'Arthur'

GO

CREATE TABLE #tblEmails
(
EmailID int,
ContactID int,
EmailAddress varchar(100)
)

GO

INSERT INTO #tblEmails (EmailID, ContactID, EmailAddress)
SELECT 1, 1, 'fred@someorg.cam' UNION ALL
SELECT 2, 2, 'jim@goggle.cam' UNION ALL
SELECT 3, 3, 'Arthur@giggle.cam'

CREATE TABLE #tblNoEmail
(
EmailAddress varchar(100)
)

INSERT INTO #tblNoEmail(EmailAddress)
SELECT 'jim@goggle.cam'

GO

SELECT * FROM #tblContacts
INNER JOIN #tblEmails On #tblContacts.ContactID = #tblEmails.ContactID
WHERE #tblEmails.EmailAddress NOT IN (SELECT EmailAddress FROM #tblNoEmail)

GO

DROP TABLE #tblContacts
DROP TABLE #tblEmails
DROP TABLE #tblNoEmail



Jim has told us he does not want to be contacted by us but, for historical reasons, his contact record cannot be deleted. There are many dozens of places within a number of applications used by a number of different teams that can send an email to a list of people. I need to make sure no email is ever sent to his email address. I can't simply put a flag on the #tblEmails for reasons I won't bore you with. Apart from those reasons, I have been told to create #tblNoEmail and maintain a list of email addresses that must never be contacted. Whenever any email is sent from any of our applications (and there are about 20 teams using their own project planning systems over the same database) - I have to check any email address we are about to send an email to is not in 'tblNoEmail'

Something like ....


SELECT ContactID, Contact, EmailAddress
FROM #tblContacts
INNER JOIN #tblEmails On #tblContacts.ContactID = #tblEmails.ContactID
WHERE #tblEmails.EmailAddress NOT IN (SELECT EmailAddress FROM #tblNoEmail)


should do it (although there is a syntax error in my code above that I can't find and that statement above will not run for some reason ? ). But, given there are 200,000 contact records, about 500,000 records in #tblEmails and will be a few thousand in #tblNoEmail - what's the most efficient way of always making sure that any email in a list of email addresses I am about to send an email to will not contain any of the email addresses in #tblNoEmail



Phil Parkin
Phil Parkin
SSC Guru
SSC Guru (192K reputation)SSC Guru (192K reputation)SSC Guru (192K reputation)SSC Guru (192K reputation)SSC Guru (192K reputation)SSC Guru (192K reputation)SSC Guru (192K reputation)SSC Guru (192K reputation)

Group: General Forum Members
Points: 192270 Visits: 23740
Your query just needed aliases.
A fixed version, along with an alternative which may perform better (you should test, of course)

SELECT * FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
WHERE e.EmailAddress NOT IN (SELECT EmailAddress FROM #tblNoEmail)

SELECT * FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
WHERE NOT EXISTS (SELECT 1 FROM #tblNoEmail ne WHERE ne.EmailAddress = e.EmailAddress)



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
Luis Cazares
Luis Cazares
SSC Guru
SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)

Group: General Forum Members
Points: 146284 Visits: 22069
webskater - Wednesday, May 2, 2018 12:24 PM

CREATE TABLE #tblContacts
(
ContactID int,
Contact varchar(100)
)

GO

INSERT INTO #tblContacts (ContactID, Contact)
SELECT 1, 'Fred' UNION ALL
SELECT 2, 'Jim' UNION ALL
SELECT 3, 'Arthur'

GO

CREATE TABLE #tblEmails
(
EmailID int,
ContactID int,
EmailAddress varchar(100)
)

GO

INSERT INTO #tblEmails (EmailID, ContactID, EmailAddress)
SELECT 1, 1, 'fred@someorg.cam' UNION ALL
SELECT 2, 2, 'jim@goggle.cam' UNION ALL
SELECT 3, 3, 'Arthur@giggle.cam'


CREATE TABLE #tblNoEmail
(
EmailID int
);

INSERT INTO #tblNoEmail(EmailID)
SELECT 2;

GO

SELECT *
FROM #tblContacts AS c
INNER JOIN #tblEmails AS e ON c.ContactID = e.ContactID
WHERE e.EmailID NOT IN (SELECT n.EmailID FROM #tblNoEmail AS n);

GO

DROP TABLE #tblContacts
DROP TABLE #tblEmails
DROP TABLE #tblNoEmail



Jim has told us he does not want to be contacted by us but, for historical reasons, his contact record cannot be deleted. There are many dozens of places within a number of applications used by a number of different teams that can send an email to a list of people. I need to make sure no email is ever sent to his email address. I can't simply put a flag on the #tblEmails for reasons I won't bore you with. Apart from those reasons, I have been told to create #tblNoEmail and maintain a list of email addresses that must never be contacted. Whenever any email is sent from any of our applications (and there are about 20 teams using their own project planning systems over the same database) - I have to check any email address we are about to send an email to is not in 'tblNoEmail'

Something like ....


SELECT *
FROM #tblContacts AS c
INNER JOIN #tblEmails AS e ON c.ContactID = e.ContactID
WHERE e.EmailID NOT IN (SELECT n.EmailID FROM #tblNoEmail AS n);


should do it (although there is a syntax error in my code above that I can't find and that statement above will not run for some reason ? ). But, given there are 200,000 contact records, about 500,000 records in #tblEmails and will be a few thousand in #tblNoEmail - what's the most efficient way of always making sure that any email in a list of email addresses I am about to send an email to will not contain any of the email addresses in #tblNoEmail




I'd say that the most efficient way is to use the EmailID instead of the full address. You would also need proper indexes and Foreign Key constraints. Your code should also use table aliases and qualified names for the columns (using the table aliases). Other than that, I don't see much improvement. I did some changes on your code.


Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?


How to post data/code on a forum to get the best help: Option 1 / Option 2
Jeffrey Williams 3188
Jeffrey Williams 3188
SSC Guru
SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)SSC Guru (70K reputation)

Group: General Forum Members
Points: 70126 Visits: 10690
Here is one alternative:


With validEmail (EmailAddress)
As (
Select te.EmailAddress
From #tblEmails te
Except
Select tne.EmailAddress
From #tblNoEmail tne
)
Select tc.ContactID
, tc.Contact
, te.EmailID
, te.ContactID
, te.EmailAddress
From #tblContacts tc
Inner Join #tblEmails te On te.EmailID = tc.ContactID
Inner Join validEmail ve On ve.EmailAddress = te.EmailAddress;


Test for performance, of course...

Another option would be to create the no email the table exactly the same as the email table. Then, you can perform an easy insert from the email table into the no email table using all columns.


With validEmail
As (
Select te.EmailID
, te.ContactID
, te.EmailAddress
From #tblEmails te
Except
Select tne.EmailID
, tne.ContactID
, tne.EmailAddress
From #tblNoEmail tne
)
Select tc.ContactID
, tc.Contact
, ve.EmailID
, ve.ContactID
, ve.EmailAddress
From #tblContacts tc
Inner Join validEmail ve On ve.ContactID = tc.ContactID;


To further simplify the changes you need to make - create a view to replace the email table using the second version above and replace #tblEmail with your view.

Jeffrey Williams
Problems are opportunities brilliantly disguised as insurmountable obstacles.

How to post questions to get better answers faster
Managing Transaction Logs

ScottPletcher
ScottPletcher
SSC Guru
SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)

Group: General Forum Members
Points: 74340 Visits: 8878
As noted, use NOT EXISTS. Also, uniquely cluster the NoEmail table on the EmailAddress.


...
CREATE TABLE #tblNoEmail
(
EmailAddress varchar(100) PRIMARY KEY CLUSTERED
)
...
SELECT *
FROM #tblContacts C
INNER JOIN #tblEmails E ON C.ContactID = E.ContactID
WHERE NOT EXISTS (
SELECT 1
FROM #tblNoEmail NE
WHERE E.EmailAddress = NE.EmailAddress
)



SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial:
If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.
webskater
webskater
Hall of Fame
Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)Hall of Fame (3.1K reputation)

Group: General Forum Members
Points: 3114 Visits: 524
Thank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.

A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?

And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail

I could do this ...(or similar using the CTE solution above)


SELECT e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;


Is there a better way to do that?


Luis Cazares
Luis Cazares
SSC Guru
SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)SSC Guru (146K reputation)

Group: General Forum Members
Points: 146284 Visits: 22069
webskater - Wednesday, May 2, 2018 5:32 PM
Thank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.

A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?

And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail

I could do this ...(or similar using the CTE solution above)


SELECT e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;


Is there a better way to do that?



As mentioned before, the index on tblNoEmail should be a clustered index on the single column it has (which I still recommend to keep it as int to reduce the size of the data).
If you need to show the whole universe of emails including the ones on the "no list", that's a good way to do it. If you only need to show active emails, use NOT IN or NOT EXISTS.
I recommend to stay away from the CTE with EXCEPT because that needs to read the Emails table twice and that usually means that more resources are needed.


Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?


How to post data/code on a forum to get the best help: Option 1 / Option 2
george_at_sql
george_at_sql
Right there with Babe
Right there with Babe (737 reputation)Right there with Babe (737 reputation)Right there with Babe (737 reputation)Right there with Babe (737 reputation)Right there with Babe (737 reputation)Right there with Babe (737 reputation)Right there with Babe (737 reputation)Right there with Babe (737 reputation)

Group: General Forum Members
Points: 737 Visits: 1271
webskater - Wednesday, May 2, 2018 5:32 PM
Thank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.

A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?

And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail

I could do this ...(or similar using the CTE solution above)


SELECT e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;


Is there a better way to do that?


That seems to be a good way to do based on your required output.

The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)

Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.

sgmunson
sgmunson
SSC Guru
SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)SSC Guru (73K reputation)

Group: General Forum Members
Points: 73434 Visits: 6329
george_at_sql - Thursday, May 3, 2018 10:46 AM
webskater - Wednesday, May 2, 2018 5:32 PM
Thank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.

A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?

And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail

I could do this ...(or similar using the CTE solution above)


SELECT e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;


Is there a better way to do that?


That seems to be a good way to do based on your required output.

The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)

Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.

True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....


Steve
‌(aka sgmunson)
Smile Smile Smile
Health & Nutrition
ScottPletcher
ScottPletcher
SSC Guru
SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)SSC Guru (74K reputation)

Group: General Forum Members
Points: 74340 Visits: 8878
sgmunson - Thursday, May 3, 2018 12:00 PM
george_at_sql - Thursday, May 3, 2018 10:46 AM
webskater - Wednesday, May 2, 2018 5:32 PM
Thank you all for your answers. I'll try them on the real data tomorrow and see which performs the quickest.

A couple of other questions ... If I stick with #tblNoEmail as just having one column - EmailAddress - what sort of index should I put on it?

And I will need, wherever a list of email addresses is displayed in the front end, to highlight any email addresses that are in #tblNoEmail

I could do this ...(or similar using the CTE solution above)


SELECT e.EmailAddress, CASE WHEN ne.EmailAddress IS NOT NULL THEN 'No' ELSE 'Yes' END AS [CanUseEmailAddress] FROM #tblContacts c
JOIN #tblEmails e ON c.ContactID = e.ContactID
LEFT JOIN #tblNoEmail ne ON e.EmailAddress = ne.EmailAddress;


Is there a better way to do that?


That seems to be a good way to do based on your required output.

The simplest solution is often the best one especially in SQL, and also the query you have written is in an uncomplicated way, and therefore you do not have to worry about he best way the optimizer would implement the solution(SQL is a declarative language, i would rather not to specify the algorithm the database should choose to fetch me the data; rather i would keep things such as statistics up to date, and add indexes if it makes sense, add constraints etc anything that helps the optimizer to make smart decisions)

Some people worry about the join "order" in a query; the database would be smart enough to see the query you are requesting as a whole and decide which joins it should do first then the next etc.

True enough about trusting the optimizer, but remember that it's only as good as the statistics you keep....

Well, but the optimizer has a pre-set order of operations in some cases, including the order in which it evaluates clauses, which means for some things how you write them can still have a dramatic effect on performance. And from direct experience, I can tell you that using [NOT] EXISTS rather than LEFT JOIN is one of those things which can have a big performance impact in certain cases.


SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial:
If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search