|
|
|
SSChampion
        
Group: General Forum Members
Last Login: Today @ 8:47 AM
Points: 11,617,
Visits: 27,678
|
|
ok i'm not sure if this is doing a valid performance test or not;
i had initially thought that creating an object would be slower than regex, but that's not what i'm seeing. On my Dev machine with no load on it at all, but limited to a max memory of 1024 Meg, i'm getting results for 1 Million random rows of data in about 35 seconds for the .Net item, but the Regex seems to go for more than eight minutes; i wonder if it's because regex is not System.IDisposable?
does anyone see similar results, or am i missing something obvious? results:
-------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== BASELINE ==========
SQL Server Execution Times: CPU time = 125 ms, elapsed time = 526 ms. ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using Mail Object Function ==========
(1000000 row(s) affected)
SQL Server Execution Times: CPU time = 23665 ms, elapsed time = 30350 ms. ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using RegEx Function ========== --Gave up after Eight Plus minutes
this is what i used for testing: I generate some random data, that sometimes matches the expected format with @ and .com in it.
WITH cteBase AS ( --=== Gen enough 8 character random codes SELECT TOP 1000000 LEFT(NEWID(),8) AS RandomCode FROM Master.sys.All_Columns ac1 CROSS JOIN Master.sys.All_Columns ac2 ) , cteManip AS ( --=== Number the codes so we can select unique random codes SELECT RandomCode + CASE WHEN RandomCode LIKE '[A-Z]%' THEN '@' ELSE '' END + RandomCode + CASE WHEN RandomCode LIKE '[A-Z]%' THEN '.com' ELSE '' END AS RandomCode FROM cteBase
) --=== Select 1 million random codes SELECT identity(int,1,1) as ID,RandomCode INTO #MySampleEmails FROM cteManip DECLARE @HOLDER VARCHAR(20); PRINT REPLICATE('-',80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-',80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== BASELINE =========='; SET STATISTICS TIME ON; SELECT @HOLDER = COUNT(*) FROM #MySampleEmails; SET STATISTICS TIME OFF; PRINT REPLICATE('=',80);
PRINT REPLICATE('-',80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-',80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== Using Mail Object Function =========='; SET STATISTICS TIME ON; SELECT *,dbo.IsValidEmail1(RandomCode) As IsValidEmail into #deleteme1 FROM #MySampleEmails SET STATISTICS TIME OFF;
PRINT REPLICATE('=',80); PRINT REPLICATE('-',80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-',80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== Using RegEx Function =========='; SET STATISTICS TIME ON; SELECT *,dbo.IsValidEmail2(RandomCode) As IsValidEmail into #deleteme2 FROM #MySampleEmails
SET STATISTICS TIME OFF;
PRINT REPLICATE('=',80); PRINT REPLICATE('-',80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-',80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
drop table #deleteme1 drop table #deleteme2 drop table #MySampleEmails
Lowell
--There is no spoon, and there's no default ORDER BY in sql server either. Actually, Common Sense is so rare, it should be considered a Superpower. --my son
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Today @ 8:44 AM
Points: 8,575,
Visits: 8,222
|
|
Lowell I dropped the million to pull only 1,000 and it is painfully obvious that the MailAddress method blows the doors off the regex version for performance.
Here is the output.
(1000 row(s) affected) -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== BASELINE ==========
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using Mail Object Function ==========
SQL Server Execution Times: CPU time = 62 ms, elapsed time = 74 ms.
(1000 row(s) affected) ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using RegEx Function ==========
SQL Server Execution Times: CPU time = 7379 ms, elapsed time = 7501 ms.
(1000 row(s) affected) ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator.
Until yesterday I had never had the regex code that I posted used in a CLR but I can't think of anything that would help it perform faster. Seems to me that the MailAddress version will perform better in addition to being more thoroughly tested. Although I did run this a lot of times with the following query .
select d1.RandomCode, d1.IsValidEmail, d2.IsValidEmail from #deleteme1 d1 left join #Deleteme2 d2 on d1.Randomcode = d2.randomCode where d1.IsValidEmail <> isnull(d2.IsValidEmail, 2)
select d1.RandomCode, d1.IsValidEmail, d2.IsValidEmail from #deleteme2 d2 left join #Deleteme1 d1 on d1.Randomcode = d2.randomCode where d2.IsValidEmail <> isnull(d1.IsValidEmail, 2)
I was wondering if I would find any of these random codes that returned different results but I didn't seem to have any luck finding any. I wonder if turning them into table valued CLR's would help performance over the scalar udf. I am swamped right now but this sounds like a fun project.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Moden's splitter.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Today @ 7:31 AM
Points: 6,703,
Visits: 11,731
|
|
I was really surprised at how poorly the RegEx did compared to MailAddress. I am now wondering what MailAddress uses to determine validity.
Here are a couple TVFs to go along with the SVFs:
using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Text.RegularExpressions; using System.Collections; using System.Net.Mail;
public partial class UserDefinedFunctions { private static readonly Regex re = new Regex("^([0-9a-zA-Z]([-.\\w]*[0-9a-zA-Z])*@([0-9a-zA-Z][-\\w]*[0-9a-zA-Z]\\.)+[a-zA-Z]{2,9})$");
[Microsoft.SqlServer.Server.SqlFunction] public static SqlBoolean IsValidEmailRegEx(SqlString email) { return re.IsMatch(email.ToString()); }
[SqlFunction(FillRowMethodName = "EmailIsValidRegEx", TableDefinition = "EmailIsValid bit")] public static IEnumerable EmailValidityRegEx(SqlString emailAddress) { ArrayList EmailCollection = new ArrayList(); EmailCollection.Add(emailAddress); return EmailCollection; }
//FillRow method. The method name has been specified above as //a SqlFunction attribute property public static void EmailIsValidRegEx(object emailAddress, out SqlBoolean emailIsValid) { emailIsValid = re.IsMatch(emailAddress.ToString()); }
};
using System; using System.Data; using System.Data.SqlClient; using System.Data.SqlTypes; using Microsoft.SqlServer.Server; using System.Collections; using System.Net.Mail;
public partial class UserDefinedFunctions { [Microsoft.SqlServer.Server.SqlFunction] public static SqlBoolean IsValidEmailSystemNet(SqlString email) { try { System.Net.Mail.MailAddress ma = new System.Net.Mail.MailAddress(email.ToString()); return (SqlBoolean)true; } catch { return (SqlBoolean)false; } }
[SqlFunction(FillRowMethodName = "EmailIsValidSystemNet", TableDefinition = "EmailIsValid bit")] public static IEnumerable EmailValiditySystemNet(SqlString emailAddress) { ArrayList EmailCollection = new ArrayList(); EmailCollection.Add(emailAddress); return EmailCollection; }
//FillRow method. The method name has been specified above as //a SqlFunction attribute property public static void EmailIsValidSystemNet(object emailAddress, out SqlBoolean emailIsValid) { try { MailAddress ma = new MailAddress(emailAddress.ToString()); emailIsValid = (SqlBoolean)true; } catch { emailIsValid = (SqlBoolean)false; } }
};
On my lowly 32-bit 2.4 GHz Intel Core 2 Duo laptop w/4GB RAM the MailAddress SVF took the prize although I wonder if on a machine with more CPUs if there would be a tipping where the TVF might finish in a shorter amount of time:
SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.
(1000 row(s) affected) -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== BASELINE ==========
SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using Mail Object Function ==========
SQL Server Execution Times: CPU time = 16 ms, elapsed time = 61 ms.
(1000 row(s) affected) ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using RegEx Function ==========
SQL Server Execution Times: CPU time = 7145 ms, elapsed time = 7437 ms.
(1000 row(s) affected) ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using Table-Valued Mail Object Function ==========
SQL Server Execution Times: CPU time = 16 ms, elapsed time = 80 ms.
(1000 row(s) affected) ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator. ========== Using Table-Valued RegEx Function ==========
SQL Server Execution Times: CPU time = 7207 ms, elapsed time = 7429 ms.
(1000 row(s) affected) ================================================================================ -------------------------------------------------------------------------------- CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE -------------------------------------------------------------------------------- DBCC execution completed. If DBCC printed error messages, contact your system administrator. DBCC execution completed. If DBCC printed error messages, contact your system administrator.
USE test; SET STATISTICS TIME OFF;
IF OBJECT_ID(N'tempdb..#MySampleEmails') IS NOT NULL DROP TABLE #MySampleEmails; GO
WITH cteBase AS ( --=== Gen enough 8 character random codes SELECT TOP 1000 LEFT(NEWID(), 8) AS RandomCode FROM Master.sys.All_Columns ac1 CROSS JOIN Master.sys.All_Columns ac2 ), cteManip AS ( --=== Number the codes so we can select unique random codes SELECT RandomCode + CASE WHEN RandomCode LIKE '[A-Z]%' THEN '@' ELSE '' END + RandomCode + CASE WHEN RandomCode LIKE '[A-Z]%' THEN '.com' ELSE '' END AS RandomCode FROM cteBase ) --=== Select 1 million random codes SELECT IDENTITY( INT,1,1 ) AS ID, RandomCode INTO #MySampleEmails FROM cteManip;
---------------------------------------
IF OBJECT_ID(N'tempdb..#deleteme1') IS NOT NULL DROP TABLE #deleteme1; GO IF OBJECT_ID(N'tempdb..#deleteme2') IS NOT NULL DROP TABLE #deleteme2; GO IF OBJECT_ID(N'tempdb..#deleteme3') IS NOT NULL DROP TABLE #deleteme3; GO IF OBJECT_ID(N'tempdb..#deleteme4') IS NOT NULL DROP TABLE #deleteme4; GO
DECLARE @HOLDER VARCHAR(20); PRINT REPLICATE('-', 80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-', 80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== BASELINE =========='; SET STATISTICS TIME ON; SELECT @HOLDER = COUNT(*) FROM #MySampleEmails; SET STATISTICS TIME OFF; PRINT REPLICATE('=', 80);
PRINT REPLICATE('-', 80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-', 80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== Using Mail Object Function =========='; SET STATISTICS TIME ON; SELECT *, dbo.IsValidEmailSystemNet(RandomCode) AS IsValidEmail INTO #deleteme1 FROM #MySampleEmails OPTION (MAXDOP 0); SET STATISTICS TIME OFF;
PRINT REPLICATE('=', 80); PRINT REPLICATE('-', 80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-', 80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== Using RegEx Function =========='; SET STATISTICS TIME ON; SELECT *, dbo.IsValidEmailRegEx(RandomCode) AS IsValidEmail INTO #deleteme2 FROM #MySampleEmails;
SET STATISTICS TIME OFF;
PRINT REPLICATE('=', 80); PRINT REPLICATE('-', 80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-', 80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== Using Table-Valued Mail Object Function =========='; SET STATISTICS TIME ON; SELECT mse.ID, mse.RandomCode, x.EmailIsValid INTO #deleteme4 FROM #MySampleEmails mse CROSS APPLY dbo.EmailValiditySystemNet(mse.RandomCode) x;
SET STATISTICS TIME OFF;
PRINT REPLICATE('=', 80); PRINT REPLICATE('-', 80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-', 80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
PRINT '========== Using Table-Valued RegEx Function =========='; SET STATISTICS TIME ON; SELECT mse.ID, mse.RandomCode, x.EmailIsValid INTO #deleteme3 FROM #MySampleEmails mse CROSS APPLY dbo.EmailValidityRegEx(mse.RandomCode) x;
SET STATISTICS TIME OFF;
PRINT REPLICATE('=', 80); PRINT REPLICATE('-', 80); PRINT 'CLEAR CACHE SO NEITHER QUERY GETS AN ADVANTAGE'; PRINT REPLICATE('-', 80); DBCC FREEPROCCACHE; DBCC DROPCLEANBUFFERS;
----------------------------------------------------------------------------- ----------------------------------------------------------------------------- IF OBJECT_ID(N'tempdb..#deleteme1') IS NOT NULL DROP TABLE #deleteme1; GO IF OBJECT_ID(N'tempdb..#deleteme2') IS NOT NULL DROP TABLE #deleteme2; GO IF OBJECT_ID(N'tempdb..#deleteme3') IS NOT NULL DROP TABLE #deleteme3; GO IF OBJECT_ID(N'tempdb..#deleteme4') IS NOT NULL DROP TABLE #deleteme4; GO IF OBJECT_ID(N'tempdb..#MySampleEmails') IS NOT NULL DROP TABLE #MySampleEmails; GO
__________________________________________________________________________________________________ There are no special teachers of virtue, because virtue is taught by the whole community. --Plato
Believe you can and you're halfway there. --Theodore Roosevelt
Everything Should Be Made as Simple as Possible, But Not Simpler --Albert Einstein
The significant problems we face cannot be solved at the same level of thinking we were at when we created them. --Albert Einstein
1 apple is not exactly 1/8 of 8 apples. Because there are no absolutely identical apples. --Giordy
|
|
|
|