finding occurences of select *

  • I'm thinking about how to identify occurences of select * in sql code (sp's, views and functions.)

    ideally I'd like to spot items such as

    select bob,*

    select a.*

    and exclude items such as

    1) ..exists (select * from...

    2) /* This is a great select */

    3) /***** This is the greatest select *****/

    Once done, I'd ideally implement this as a policy in PBM (but let's get the query working first!)

    To get this working properly would not be trivial...hence my post.

    Have any of you already done this? Any high level ideas about the best technique?

    Thanks for sharing.

    David McKinney.

  • That kind of complex pattern matching is going to be best done through regex functions. That means CLR, since that's MUCH better at regex than T-SQL.

    Once you've built the regex, it should be a simple matter of querying sys.sql_modules with it.

    But building it in the first place is going to be a non-trivial task. I'm inclined to suggest that you check with companies like Red Gate and ApexSQL and see if they already have something like that, or can build it into one of their DBA tools. If not, then hire a contractor to do it for you. ExpertsExchange might be appropriate.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I agree with Gus, it's not all that easy; At first,i couldn't think of how to do it in a single query and thought i might using a cursor for this, which i think is just an admission of a lack of imagination.

    After i thought it through a little bit, i created a function that strips TSQL comments out of a string...you know, anything between /* */ or -- to vbCrLf ; that is going to generate my core CTE results

    the second part just looks for any old asterisk in the results....

    so this does not handle the IF EXISTS, but it does handle the bulk of the requirement i think.

    you might use this as a base SQL, the test if "EXISTS" occurs in the definition and also two or more "*"

    exists to limit the code to review further.

    CREATE FUNCTION StripComments(@definition varchar(max))

    RETURNS varchar(max)

    AS

    BEGIN

    declare @vbCrLf CHAR(2)

    SET @vbCrLf = CHAR(13) + CHAR(10)

    --'objective: strip out comments.

    --first loop is going to look for pairs of '/*' and '*/', and STUFF them with empty space.

    --second loop is going to look for sets of '--' and vbCrLf and STUFF them with empty space.

    --===== Replace all '/*' and '*/' pairs with nothing

    WHILE CHARINDEX('/*',@definition) > 0

    SELECT @definition = STUFF(@definition,

    CHARINDEX('/*',@definition),

    CHARINDEX('*/',@definition) - CHARINDEX('/*',@definition) + 2, --2 is the length of the search term

    '')

    --===== Replace all single line comments

    WHILE CHARINDEX('--',@definition) > 0

    AND CHARINDEX(@vbCrLf,@definition,CHARINDEX('--',@definition)) > CHARINDEX('--',@definition)

    SELECT @definition = STUFF(@definition,

    CHARINDEX('--',@definition),

    CHARINDEX(@vbCrLf,@definition,CHARINDEX('--',@definition)) - CHARINDEX('--',@definition) + 2,

    '')

    return @definition --you can now search this without false positives from comments.

    END

    GO

    With myDefinitions AS

    (

    select

    OBJECT_NAME(object_id) As ObName,

    dbo.StripComments(definition) As TheDef

    from sys.sql_modules

    )

    SELECT * FROM myDefinitions

    where TheDef like '%!*%' ESCAPE '!'

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Would this sort of thing be useful?

    SELECT Isnull(sp.name, sw.name), sm.object_id

    FROM sys.sql_modules sm

    LEFT OUTER JOIN sys.procedures sp ON sp.object_id = sm.object_id

    LEFT OUTER JOIN sys.VIEWS sw ON sw.object_id = sm.object_id

    WHERE (definition LIKE '%SELECT *%' OR definition LIKE '%SELECT %*%' )

    AND definition NOT LIKE '%SELECT **%'

    -EDIT-

    Ah, read through Lowell's solution and realise I missed the point.


    Forever trying to learn
    My blog - http://www.cadavre.co.uk/
    For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
    For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • the second part to test for EXISTs is not so easy;

    looking at our code base, we have a couple of different styles for people using EXISTS:

    IF exists(select specificColumnName

    ---

    IF exists(

    select * [...]

    the whitespace can cause variations in the capture....

    maybe remove all whitespace and spaces completely, and test for "exists(select*" might be a winning idea.

    --edit: thi sseems to have stripped out any EXISTS(SELECT*, leaving jsut definitions that really contain asterisks:

    declare @vbCrLf CHAR(2),

    @vbTab CHAR(1)

    SET @vbCrLf = CHAR(13) + CHAR(10)

    SET @vbTab = CHAR(9)

    With myDefinitions AS

    (

    select

    OBJECT_NAME(object_id) As ObName,

    REPLACE(

    REPLACE(

    REPLACE(dbo.StripComments(definition),

    @vbTab,''),

    @vbCrLf,''),

    space(1),'') As TheDef

    from sys.sql_modules

    ),

    TestNoExists As (

    SELECT *

    FROM myDefinitions

    --contains Exists

    WHERE CHARINDEX('exists(select*',TheDef) = 0

    UNION

    SELECT ObName, REPLACE(TheDef,'exists(select*','')

    FROM myDefinitions

    --contains Exists

    )

    SELECT * FROM TestNoExists

    where TheDef like '%!*%' ESCAPE '!'

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks, guys.

    I think I'll make a suggestion to Redgate as an enhancement to the refactoring functionality in sql prompt. It can already identify the * as it proposes the expand the column list....so I'm guessing the development would be relatively minor.

    In the meantime, I think I'll content myself with simple occurences of 'select *' which are not occurrences of 'exists (select *'.

    -> GSquared, I agree that a CLR function with Regex is almost certainly more suited to this kind of work than SQL -- I might give it a try.

    -> Lowell, I too was starting to write code for this, but I think it's going to take too long to get perfect. I like the code to strip out the comments though 🙂

  • i think i got it...try my updated post, which assumes that same function. does that get you where you wanted to be?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • There are edge cases where Lowell's solution will return a false positive. For example, a commented-out-end-comment.

    /* This comment has been

    --*/modified

    by devs who weren't paying attention*/

    Reversing the sequence on the comment removal will fix that one, but may or may not be necessary.

    The final query will return queries that multiply values together, or use the asterisk character in string literals within the code. Lots of possibilities for false positives.

    If dealing with those by reviewing through human eyes is acceptable, then it'll probably suffice. If you're trying to automate something, it won't.

    You could use a standard string parsing routine (Numbers/Tally version) to handle the multi-comment situation, instead of a loop, if you want more efficiency. Those are limited on very long strings, which means they may or may not be functional on varchar(max) date like the definition column in sys.sql_modules. XML building usually works better on that, and would probably still be more efficient than a couple of loops. Especially if your devs are the rare types who comment a lot in long pieces of code.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • doh i didn't even think about if there were any math functions in the bodies; good catch; i'm not sure on the commented comments...

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Depending on the the Dev tools your company uses, you could look at VS 2008/2010, as you get built in code analysis tools within the DB projects, and you can extend them using C# to cater for company specific coding rules.

    _________________________________________________________________________
    SSC Guide to Posting and Best Practices

  • Lowell (2/8/2011)


    doh i didn't even think about if there were any math functions in the bodies; good catch; i'm not sure on the commented comments...

    Commented-comments are an edge case and may or may not be applicable.

    I'm just in the habit of looking at code from the viewpoint of "can I break this?" Part of my QA hat.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • crazy thought....I don't suppose looking at query plans, instead of query defintions is going to get me anywhere useful? Any thoughts?

  • Won't get you what you need on this point, so far as I know.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I believe I have what I'm looking for in why-you-cant-trust-sysdepends by Thomas LaRock.

    I found it thanks to a an article By Aaron Bertrand on PBM who directed me to todays Friday repeat here on SSC[/url].

    Crazy set of coincidences! Shame I'm not religious.

  • ...and if you can't be bothered to click all those links....

    SELECT so.name, schema_name(so.schema_id) FROM sys.sql_dependencies sd

    INNER JOIN sys.objects so on sd.object_Id=so.object_id

    where is_select_all=1

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply