Retrieve data between 2 '.'

  • Hi geniuses!

    Hi have a field (VALUE), which contains data like:

    MNN.helloworld

    GFF.goodbyeworld

    SSW.seeyoulaterworld.oops

    And I want to retrieve data when there's only one '.' in the data.

    Is this doable?

    Thanks

    Regards

  • --========================--

    --== CREATE SAMPLE DATA ==--

    --========================--

    SELECT yourData

    INTO #yourTable

    FROM (VALUES('MNN.helloworld'),('GFF.goodbyeworld'),('SSW.seeyoulaterworld.oops')

    )a(yourData);

    --========================--

    --== SOLUTION ==--

    --========================--

    SELECT *

    FROM #yourTable

    WHERE LEN(yourData)-1 = LEN(REPLACE(yourData COLLATE Latin1_General_BIN2,'.',''));


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Thanks

    It gives me an error: Argument data type ntext is invalid for argument 1 of len function

    Solutions?

    Regards

  • davdam8 (10/24/2012)


    Thanks

    It gives me an error: Argument data type ntext is invalid for argument 1 of len function

    Solutions?

    Regards

    That would be because you didn't supply DDL or readily consumable sample data, which caused me to assume that you were not using a deprecated type.

    You'll have to cast it to NVARCHAR or VARCHAR to use, e.g.

    SELECT *

    FROM #yourTable

    WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • WOW!

    Works great!

    Thanks body!

    Regards

  • davdam8 (10/24/2012)


    WOW!

    Works great!

    Thanks body!

    Regards

    And if at all possible change your ntext to nvarchar(max). ntext is deprecated and complete PITA to work with.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • second option for the same requiment

    declare @temp table(t varchar(50))

    insert into @temp

    select * from

    (

    values

    ('a.p'),

    ('y.a.p'),

    ('k.p')

    )a (name)

    select *

    from @temp

    where CHARINDEX('.',t,CHARINDEX('.',t)+1)=0

  • Two solutions. . . this means performance check!! 🙂

    Here's 1 million rows of sample data: -

    SET NOCOUNT ON;

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    --1,000,000 Random rows of data

    SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS NTEXT) AS yourData

    INTO #testEnvironment

    FROM (SELECT TOP 1000000

    REPLICATE(

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    REPLACE(NEWID(),'-','')

    ,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',

    (ABS(CHECKSUM(NEWID())) % 2) + 2)

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    )a(yourData);

    DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;

    SELECT @StartTime = SYSDATETIME();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;

    Duration for CHARINDEX = 00:00:10:660

    Duration for LEN = 00:00:06:327

    Second run (I recreated the test environment between each run):-

    DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;

    SELECT @StartTime = SYSDATETIME();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;

    Duration for LEN = 00:00:10:927

    Duration for CHARINDEX = 00:00:05:900

    So the CHARINDEX function looks better, but both are pretty slow.

    Anyone got a better way?


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • A mix of the two seems to help

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData COLLATE Latin1_General_BIN2 , CHARINDEX('.', yourData COLLATE Latin1_General_BIN2) + 1) = 0;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Good idea Mark. I also added an ugly XML split to count the number of "items", then modified the test script to clear the cache between each piece of code to attempt to keep it fairer.

    SET NOCOUNT ON;

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    --1,000,000 Random rows of data

    SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS NTEXT) AS yourData

    INTO #testEnvironment

    FROM (SELECT TOP 1000000

    REPLICATE(

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    REPLACE(NEWID(),'-','')

    ,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',

    (ABS(CHECKSUM(NEWID())) % 2) + 2)

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    )a(yourData);

    DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM (SELECT ID, yourData, MAX(rn)

    FROM (SELECT ID, yourData, split.Part.value('text()[1]', 'VARCHAR(MAX)'), ROW_NUMBER() OVER(PARTITION BY yourData ORDER BY (SELECT NULL))

    FROM (SELECT CAST('<p>' + REPLACE(CAST(yourData AS VARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.','</p><p>') + '</p>' AS XML),

    CAST(yourData AS VARCHAR(MAX)), ID

    FROM #testEnvironment) innerQ(xmlField, yourData, ID)

    CROSS APPLY innerQ.xmlField.nodes('p') split(Part)

    ) a(ID, yourData,splitData,rn)

    GROUP BY ID, yourData

    ) a(ID, yourData, rn)

    WHERE rn = 2;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for ugly xml split = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE;

    DBCC DROPCLEANBUFFERS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData COLLATE Latin1_General_BIN2 , CHARINDEX('.', yourData COLLATE Latin1_General_BIN2) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX with COLLATE = %s',0,1,@Duration) WITH NOWAIT;

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Duration for ugly xml split = 00:00:48:980

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Duration for CHARINDEX = 00:00:08:640

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Duration for LEN = 00:00:09:087

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    DBCC execution completed. If DBCC printed error messages, contact your system administrator.

    Duration for CHARINDEX with COLLATE = 00:00:05:323

    So the CHARINDEX with COLLATE that Mark suggested is the fastest so far - whilst my ugly XML splitter proves its worth as considerably slower 😀


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • How about this?

    SELECT *

    FROM #testenvironment

    WHERE PARSENAME(CAST(yourdata AS NVARCHAR(MAX)),3) IS NULL

  • I was so close, thanks to the QotD that made me remember the PARSENAME function.

    My code

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE PARSENAME( CAST(yourData AS NVARCHAR(MAX)),3) IS NULL

    AND PARSENAME(CAST(yourData AS NVARCHAR(MAX)), 2) IS NOT NULL

    Results using Cadavre's last test environment (without the XML). I deleted the "DBCC execution completed" comments.

    Duration for CHARINDEX = 00:00:18:363

    Duration for LEN = 00:00:09:347

    Duration for PARSENAME = 00:00:06:870

    Duration for COLLATE = 00:00:06:667

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Davin21 (10/25/2012)


    How about this?

    SELECT *

    FROM #testenvironment

    WHERE PARSENAME(CAST(yourdata AS NVARCHAR(MAX)),3) IS NULL

    I had the same idea, but I added an extra condition because your query will return results with no period (or dot) and with more than 3.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (10/25/2012)


    I was so close, thanks to the QotD that made me remember the PARSENAME function.

    My code

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE PARSENAME( CAST(yourData AS NVARCHAR(MAX)),3) IS NULL

    AND PARSENAME(CAST(yourData AS NVARCHAR(MAX)), 2) IS NOT NULL

    Results using Cadavre's last test environment (without the XML). I deleted the "DBCC execution completed" comments.

    Duration for CHARINDEX = 00:00:18:363

    Duration for LEN = 00:00:09:347

    Duration for PARSENAME = 00:00:06:870

    Duration for COLLATE = 00:00:06:667

    Very good, that was sort of the idea I was trying to implement with the XML splitter. Didn't think of PARSENAME (foolishly!).

    Also, we can get rid of the "DBCC execution completed" comments by adding "WITH NO_INFOMSGS" to the script, like so: -

    SET NOCOUNT ON;

    IF object_id('tempdb..#testEnvironment') IS NOT NULL

    BEGIN

    DROP TABLE #testEnvironment;

    END;

    --1,000,000 Random rows of data

    SELECT IDENTITY(INT,1,1) AS ID, CAST(LEFT(yourData, LEN(yourData) - 1) AS NTEXT) AS yourData

    INTO #testEnvironment

    FROM (SELECT TOP 1000000

    REPLICATE(

    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(

    REPLACE(NEWID(),'-','')

    ,'0',''),'9',''),'8',''),'7',''),'6',''),'5',''),'4',''),'3',''),'2',''),'1','') + '.',

    (ABS(CHECKSUM(NEWID())) % 2) + 2)

    FROM master.dbo.syscolumns sc1, master.dbo.syscolumns sc2, master.dbo.syscolumns sc3

    )a(yourData);

    DECLARE @HOLDER INT, @Duration CHAR(12), @StartTime DATETIME;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM (SELECT ID, yourData, MAX(rn)

    FROM (SELECT ID, yourData, split.Part.value('text()[1]', 'VARCHAR(MAX)'), ROW_NUMBER() OVER(PARTITION BY yourData ORDER BY (SELECT NULL))

    FROM (SELECT CAST('<p>' + REPLACE(CAST(yourData AS VARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.','</p><p>') + '</p>' AS XML),

    CAST(yourData AS VARCHAR(MAX)), ID

    FROM #testEnvironment) innerQ(xmlField, yourData, ID)

    CROSS APPLY innerQ.xmlField.nodes('p') split(Part)

    ) a(ID, yourData,splitData,rn)

    GROUP BY ID, yourData

    ) a(ID, yourData, rn)

    WHERE rn = 2;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for ugly xml split = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData, CHARINDEX('.', yourData) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE LEN(CAST(yourData AS NVARCHAR(MAX)))-1 = LEN(REPLACE(CAST(yourData AS NVARCHAR(MAX)) COLLATE Latin1_General_BIN2,'.',''));

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for LEN = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE CHARINDEX('.', yourData COLLATE Latin1_General_BIN2, CHARINDEX('.', yourData COLLATE Latin1_General_BIN2) + 1) = 0;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for CHARINDEX with COLLATE = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE PARSENAME(CAST(yourData AS NVARCHAR(MAX)),3) IS NULL

    AND PARSENAME(CAST(yourData AS NVARCHAR(MAX)), 2) IS NOT NULL;

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for PARSENAME = %s',0,1,@Duration) WITH NOWAIT;

    Duration for ugly xml split = 00:00:47:197

    Duration for CHARINDEX = 00:00:07:593

    Duration for LEN = 00:00:08:163

    Duration for CHARINDEX with COLLATE = 00:00:04:787

    Duration for PARSENAME = 00:00:05:193

    Luis Cazares (10/25/2012)


    Davin21 (10/25/2012)


    How about this?

    SELECT *

    FROM #testenvironment

    WHERE PARSENAME(CAST(yourdata AS NVARCHAR(MAX)),3) IS NULL

    I had the same idea, but I added an extra condition because your query will return results with no period (or dot) and with more than 3.

    I've got the agree with Luis, it's a good idea but it could return incorrect results.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi

    I thought I would add in a LIKE query to see how that compared. It wasn't as bad as I thought it would be.

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE yourData like '%.%' and yourData not like '%.%.%';

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for LIKE = %s',0,1,@Duration) WITH NOWAIT;

    DBCC FREEPROCCACHE WITH NO_INFOMSGS;

    DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

    SELECT @StartTime = GETDATE();

    SELECT @HOLDER = ID

    FROM #testEnvironment

    WHERE yourData COLLATE Latin1_General_BIN2 like '%.%' and yourData COLLATE Latin1_General_BIN2 not like '%.%.%';

    SELECT @Duration = CONVERT(CHAR(12),GETDATE()-@StartTime,114);

    RAISERROR('Duration for LIKE = %s',0,1,@Duration) WITH NOWAIT;

    And got the following

    Duration for CHARINDEX = 00:00:09:103

    Duration for LEN = 00:00:10:790

    Duration for CHARINDEX with COLLATE = 00:00:06:507

    Duration for PARSENAME = 00:00:07:717

    Duration for LIKE = 00:00:10:703

    Duration for LIKE with COLLATE = 00:00:06:517

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply