Finding Values With Numerous Columns

  • timothyawiseman (12/2/2008)


    ... I am not aware of it actually "failing" on a large table, but it can definitely be painfully slow on large tables

    Anything to do with LIKE forcing a table scan? Don't get me wrong. I like this solution as a way around certain problems. I'm looking at adding TEXT and NTEXT column types to your script. I have one of those projects where we store e-mails in a database. I want to do searches where certain words or phrases occur in either the subject, body, or response. They are looking at adding a comment column and your proc would just automatically start looking at that too.

    ATBCharles Kincaid

  • Madhivanan (12/2/2008)


    Just want to point out that ISNUMERIC and ISDATE is not reliable

    SELECT ISNUMERIC('12d5'),ISDATE('2000')

    You are right that they sometimes return positives on things that might not be intended as numbers or dates, but it seems pretty reliable in identifying what SQL will convert into dates or numbers. For instance, it interprets the d as scientific notation exactly like an e and it takes 2000 as the year and defaults to Jan 1.

    Still, that may let it return more than intended if not used carefully. Thank you for pointing it out.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • What about an approach that used a generic stored proc to dynamically create static code for a specific table?

    That is, the code could handle any table name passed to it, and optionally column(s) to exclude, and would generate code specifically for that table. You execute the resulting code to do the actual search.

    That way the code is specific and can be tuned but the flexibility is there to handle to any table, albeit at the cost of one extra step per table.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • I see this as useful (with some modification) for deciphering Microsoft SharePoint content databases, which are de-normalized to provide generic functionality.

    When I create a list in SharePoint and want to get at that list data outside of the SharePoint API I create views, but have figure out what generic column was assigned to hold my data. This is a very tedious task, and if I move the list to another site I have to manually re-map the view.

    I can in theory use this type of script to find the columns by creating a control row with specific values, and then run this util to determine the column names.

  • I enjoyed this article and will definitely use it. I work as a data analyst for an internal audit group, and we often come across data that we know relatively little about. SQL Server is one of the tools we use to handle high volume data, but it is a less-than-perfect audit tool, especially for text searching.

    Readers may also be interested in the Levenshtein Edit Distance algorithm, which is useful for finding inexact string matches:

    http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=502&lngWId=5

    Thanks!

  • Hey,

    This didn't work for me until I added a % at the end of the first dynamic SQL statement:

    SELECT @sql = CASE

    WHEN @sql IS NULL THEN 'Select ''' + ColumnName + ''' as ContainingColumn, * From ' + QUOTENAME(@Schema) + '.' + QUOTENAME(@TableName) + ' where ' + ColumnName + ' like ''' + @Value + -- Add below

    ' %'' '

    WHEN @sql IS NOT NULL THEN @sql + 'UNION ALL Select ''' + ColumnName + ''' as ContainingColumn, * From ' + QUOTENAME(@Schema) + '.' + QUOTENAME(@TableName) + ' where ' + ColumnName + ' like ''' + @Value + ''' '

    Doug

  • Douglas Osborne (12/3/2008)


    Hey,

    This didn't work for me until I added a % at the end of the first dynamic SQL statement:

    SELECT @sql = CASE

    WHEN @sql IS NULL THEN 'Select ''' + ColumnName + ''' as ContainingColumn, * From ' + QUOTENAME(@Schema) + '.' + QUOTENAME(@TableName) + ' where ' + ColumnName + ' like ''' + @Value + -- Add below

    ' %'' '

    WHEN @sql IS NOT NULL THEN @sql + 'UNION ALL Select ''' + ColumnName + ''' as ContainingColumn, * From ' + QUOTENAME(@Schema) + '.' + QUOTENAME(@TableName) + ' where ' + ColumnName + ' like ''' + @Value + ''' '

    Doug

    Adding the % directly in the dynamic sql statement forces it to always look for anything that starts with the value passed in. The way I use it is to put the % in the value passed in, just like using like directly in a select statement. That way I can have it search for the beginning or ending or any combination.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Tim,

    Why use LIKE if you are not adding % - shouldn't you just say = instead then?

    Doug

  • Douglas Osborne (12/3/2008)


    Tim,

    Why use LIKE if you are not adding % - shouldn't you just say = instead then?

    Doug

    For flexibility, I wanted to be able to use a wildcard such as % or _, or not, depending on what I was searching for. Sometimes I had an exact value to search for and sometimes I was looking for a substring.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • I appreciate this article and discussion.

    I have to do ad-hoc report queries off of a Sybase database created by an other agency (let's call them the State). I have zero input on database design, practically no documentation, and only have read access to the data tables.

    I cringe every time my users ask for queries that require finding what "program" a client is in. Why? Because the State created a set of 8 columns all containing "program" data. Users can enter 3 characters into each column to indicate the set of programs the client has. A client can have anywhere from zero up to 8 programs. There is no order to the columns and more than 8 possible programs, though the client will never have more than 8 programs at once.

    What this leaves me with is this: When the users ask, "How many clients have program ABC?" Then I have to search all 8 columns. And of course, it is never as easy as that. They usually have queries like, "How many clients have any of the following programs, ABC, XFG, T3B, ... but not GTH or XYZ."

    So, why did the State set up the Sybase database that way? Don't they know anything about database design? I'm not privy to any of their conversations or programming problems. So, I don't know. But most of their database (while not a schema I would choose) is fairly workable and mostly normalized (though they did do that annoying phone thing talked about in the article). Why then set up denormalized data for something as vital as a client's program?

    My thoughts:

    1) The program data is part of a set of data that has to mirror/be merged with a legacy mainframe database. I have no doubt that the mainframe system is set up with the same 8 columns.

    2) But that doesn't explain why they wouldn't use programming to normalize the data in the database I access. All I can think of here is that they weren't thinking about querying the data and they were mostly concerned with saving time on programming the part that merges data between the mainframe and the normal database.

    I'll tell you, if it was me, I would have done the extra programming so that the data was normalized in the Sybase database. It is a terrible pain working with their data as-is!

    But as the article points out, when you don't have a choice, it is nice to have techniques to handle the situation. I just thought people might appreciate another real world example of this kind of problem. Cheers.

  • I just thought people might appreciate another real world example of this kind of problem

    We do, we do. It is strange how quickly one forgets this sort of horror. it is nice to be reminded. Thanks JJ B, thanks timothy.

    Best wishes,
    Phil Factor
    Simple Talk

  • I just noticed something else with my "In" syntax. If you put the IN clause into a JOIN, you can use the LIKE against the original source table for a set based approach instead of concatenating.

    CREATE TABLE tblAccount (

    pkAccountId int,

    Acctnum varchar(20),

    fkOverDraftId1 int,

    fkOverDraftId2 int)

    SELECT main.Acctnum, qry.Acctnum

    FROM tblAccount main

    INNER JOIN tblAccount qry ON qry.pkAccountid IN (main.fkOverDraftId1, main.fkOverDraftId2)

    WHERE qry.AcctNum LIKE '%123%'

Viewing 12 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply