Find column values with Pattern

  • Hi,

    How can I find all the rows of a database that have a particular pattern: for example: (INC000007615432) or with other words:

    Find all columns in a Database, where Column-Properties:

    * Len (15) = char (15)

    * Starts with "INC"

    * After INC following 12 num digits

    I have over 100 tables with many columns, and i can not take a look in every table. No Documentation, Columns have diffrent Names 🙁

    Regards

    Nicole 😉

  • info 58414 (11/14/2016)


    Hi,

    How can I find all the rows of a database that have a particular pattern: for example: (INC000007615432) or with other words:

    Find all columns in a Database, where Column-Properties:

    * Len (15) = char (15)

    * Starts with "INC"

    * After INC following 12 num digits

    I have over 100 tables with many columns, and i can not take a look in every table. No Documentation, Columns have diffrent Names 🙁

    Regards

    Nicole 😉

    Quick questions, do all the columns you are looking for have the same data type? Is the data consistent within those columns, that is will all rows in those columns be of the same pattern?

    😎

  • Yes, the Column-Value has the same Pattern,

    No, the Column Datatype can be diffrent

  • Nicole

    Those aren't properties of the column: they're properties of the individual values in the columns. Are you saying that you want to find all columns that have at least one value that satisfies those conditions?

    John

  • No, all Columns have the same Pattern:

    Example: INC000001234567

    * Len (15) = char (15)

    * Starts with "INC"

    * After INC following 12 num digits

  • OK, this query will look at all columns with data length of 15 or more and show the number of values with the requested pattern. Beware that the INFORMATION_SCHEMA views have been known not to report SCHEMA_NAME correctly; if that's a problem for you, use sys.columns instead. Please remove the spaces that I had to put in in order to get this post through my company's web sweeper.

    DEC LARE @SQL nvarchar(max)

    SELECT @SQL = STUFF(

    (SELECT

    'UNI ON ALL SELECT ''' + TABLE_SCHEMA + ''', ''' + TABLE_NAME + ''', ''' + COLUMN_NAME + ''', COUNT(*) FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME

    + ' WHERE ' + COLUMN_NAME + ' LIKE ''INC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'' '

    FROM INFORMATION_SCHEMA.COLUMNS

    WHERE DATA_TYPE LIKE '%char'

    OR DATA_TYPE = 'text'

    AND COALESCE(CHARACTER_MAXIMUM_LENGTH,100) >= 16

    OR CHARACTER_MAXIMUM_LENGTH = -1 -- "max" data types

    FOR XML PATH (''))

    ,1

    ,10 -- remove the first UN ION ALL

    ,''

    )

    EXEC sys.sp_executesql @statement = @SQL

    John

  • John Mitchell-245523 (11/14/2016)


    OK, this query will look at all columns with data length of 15 or more and show the number of values with the requested pattern. Beware that the INFORMATION_SCHEMA views have been known not to report SCHEMA_NAME correctly; if that's a problem for you, use sys.columns instead. Please remove the spaces that I had to put in in order to get this post through my company's web sweeper.

    DEC LARE @SQL nvarchar(max)

    SELECT @SQL = STUFF(

    (SELECT

    'UNI ON ALL SELECT ''' + TABLE_SCHEMA + ''', ''' + TABLE_NAME + ''', ''' + COLUMN_NAME + ''', COUNT(*) FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME

    + ' WHERE ' + COLUMN_NAME + ' LIKE ''INC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'' '

    FROM INFORMATION_SCHEMA.COLUMNS

    WHERE DATA_TYPE LIKE '%char'

    OR DATA_TYPE = 'text'

    AND COALESCE(CHARACTER_MAXIMUM_LENGTH,100) >= 16

    OR CHARACTER_MAXIMUM_LENGTH = -1 -- "max" data types

    FOR XML PATH (''))

    ,1

    ,10 -- remove the first UN ION ALL

    ,''

    )

    EXEC sys.sp_executesql @statement = @SQL

    John

    Careful, this query will fail if there are any XML columns in the database.

    😎

  • Here is a query I often use, adjusted to this requirements

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    DECLARE @PATTERN NVARCHAR(MAX) = N'INC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]';

    DECLARE @SEARCH_TEMPLATE NVARCHAR(MAX) = N'

    UNION ALL

    SELECT

    ''{{@SCHEMA_NAME}}.{{@TABLE_NAME}}'' AS TABLE_NAME

    ,''{{@COLUMN_NAME}}'' AS COLUMN_NAME

    ,''{{@DATA_TYPE}}({{@MAX_LENGTH}})'' AS DATA_TYPE

    WHERE EXISTS (

    SELECT *

    FROM {{@SCHEMA_NAME}}.{{@TABLE_NAME}} X

    WHERE X.{{@COLUMN_NAME}} LIKE ''{{@PATTERN}}''

    )

    ';

    DECLARE @SEARCH_SQL NVARCHAR(MAX) =

    STUFF((

    SELECT

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(@SEARCH_TEMPLATE,N'{{@SCHEMA_NAME}}',QUOTENAME(OBJECT_SCHEMA_NAME(STAB.object_id)))

    ,N'{{@TABLE_NAME}}' ,QUOTENAME(STAB.name))

    ,N'{{@COLUMN_NAME}}',QUOTENAME(SCOL.name))

    ,N'{{@DATA_TYPE}}' ,STYP.name)

    ,N'{{@PATTERN}}' ,@PATTERN)

    ,N'{{@MAX_LENGTH}}' ,CASE

    WHEN SCOL.max_length > 0 THEN CONVERT(NVARCHAR(12),SCOL.max_length,0)

    ELSE N'MAX'

    END)

    FROM sys.tables STAB

    INNER JOIN sys.columns SCOL

    INNER JOIN sys.types STYP

    ON SCOL.system_type_id = STYP.system_type_id

    ON STAB.object_id = SCOL.object_id

    WHERE STAB.type_desc = N'USER_TABLE'

    AND STYP.name LIKE N'%CHAR'

    AND SCOL.max_length NOT BETWEEN 0 AND 15

    FOR XML PATH(''),TYPE).value('(./text())[1]','NVARCHAR(MAX)'),1,11,N'');

    EXEC (@SEARCH_SQL);

  • Eirikur Eiriksson (11/14/2016)


    John Mitchell-245523 (11/14/2016)


    OK, this query will look at all columns with data length of 15 or more and show the number of values with the requested pattern. Beware that the INFORMATION_SCHEMA views have been known not to report SCHEMA_NAME correctly; if that's a problem for you, use sys.columns instead. Please remove the spaces that I had to put in in order to get this post through my company's web sweeper.

    DEC LARE @SQL nvarchar(max)

    SELECT @SQL = STUFF(

    (SELECT

    'UNI ON ALL SELECT ''' + TABLE_SCHEMA + ''', ''' + TABLE_NAME + ''', ''' + COLUMN_NAME + ''', COUNT(*) FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME

    + ' WHERE ' + COLUMN_NAME + ' LIKE ''INC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'' '

    FROM INFORMATION_SCHEMA.COLUMNS

    WHERE DATA_TYPE LIKE '%char'

    OR DATA_TYPE = 'text'

    AND COALESCE(CHARACTER_MAXIMUM_LENGTH,100) >= 16

    OR CHARACTER_MAXIMUM_LENGTH = -1 -- "max" data types

    FOR XML PATH (''))

    ,1

    ,10 -- remove the first UN ION ALL

    ,''

    )

    EXEC sys.sp_executesql @statement = @SQL

    John

    Careful, this query will fail if there are any XML columns in the database.

    😎

    Surely not? It's only looking at %char and text columns. The worst that could happen is that there's an XML column that has plain (non-XML) data in it and that column gets overlooked, isn't it?

    John

  • John Mitchell-245523 (11/14/2016)


    Eirikur Eiriksson (11/14/2016)


    John Mitchell-245523 (11/14/2016)


    OK, this query will look at all columns with data length of 15 or more and show the number of values with the requested pattern. Beware that the INFORMATION_SCHEMA views have been known not to report SCHEMA_NAME correctly; if that's a problem for you, use sys.columns instead. Please remove the spaces that I had to put in in order to get this post through my company's web sweeper.

    DEC LARE @SQL nvarchar(max)

    SELECT @SQL = STUFF(

    (SELECT

    'UNI ON ALL SELECT ''' + TABLE_SCHEMA + ''', ''' + TABLE_NAME + ''', ''' + COLUMN_NAME + ''', COUNT(*) FROM ' + TABLE_SCHEMA + '.' + TABLE_NAME

    + ' WHERE ' + COLUMN_NAME + ' LIKE ''INC[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'' '

    FROM INFORMATION_SCHEMA.COLUMNS

    WHERE DATA_TYPE LIKE '%char'

    OR DATA_TYPE = 'text'

    AND COALESCE(CHARACTER_MAXIMUM_LENGTH,100) >= 16

    OR CHARACTER_MAXIMUM_LENGTH = -1 -- "max" data types

    FOR XML PATH (''))

    ,1

    ,10 -- remove the first UN ION ALL

    ,''

    )

    EXEC sys.sp_executesql @statement = @SQL

    John

    Careful, this query will fail if there are any XML columns in the database.

    😎

    Surely not? It's only looking at %char and text columns. The worst that could happen is that there's an XML column that has plain (non-XML) data in it and that column gets overlooked, isn't it?

    John

    It will crap out as XML is incompatible with the LEN function. Easily fixed by adding parenthesis to the length portion of the where clause.

    😎

    WHERE DATA_TYPE LIKE '%char'

    OR DATA_TYPE = 'text'

    AND (COALESCE(CHARACTER_MAXIMUM_LENGTH,100) >= 16

    OR CHARACTER_MAXIMUM_LENGTH = -1)

  • Oh, I see, yes - having my ANDs and ORs in a muddle caused XML columns to be included. Good spot!

    John

  • Another thing to pay attention to is the cost of using COUNT, it implies that the whole table must be scanned and aggregated, much heavier on the database than using EXISTS, here are the aggregated stats from a small test database.

    😎

    Table Scan Count Logical Reads Physical Reads Read-Ahead Reads LOB Logical Reads LOB Physical Reads LOB Read-Ahead Reads

    COUNT 280 508,963 0 0 45,000 0 0

    EXISTS 196 68,415 0 0 45,000 0 0

  • Yes, although I assume this is a one-off and therefore performance might not be too important. Also, using EXISTS might not meet the requirement. Since we know that all values in the column will have the prescribed format, we want to be able to identify those columns that have only one or two such values. Of course, the foolproof solution would compare the total rows in the table with the number of rows with the prescribed format (and it gets even more complicated when you consider NULLs), but I implicitly left that as an exercise for the reader!

    John

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply