Table/View breakdown of occurences

  • Hi guru's,

    I'm looking for an application that will read a SQL script (or text file) and provide a breakdown of the occurences of each word.

    The reason for this is that I wish to look at how many times views and tables are being referenced in a script.

    Any ideas?

    Rich

    Hope this helps,
    Rich

    [p]
    [/p]

  • Richard it's fairly straightforward, but involves a combination of techniques.

    if it's a file, you need to use bcp or bulk insert to get the text.

    you need to use a split function to parse the file into words.

    then it's simply a group by to review the results.

    here's a working example i just put together, which assumes you are bulk inserting a file:

    CREATE FUNCTION dbo.DelimitedSplit8K

    /*

    Revision History:

    Rev 00 - 20 Jan 2010 - Concept: Lynn Pettis and others.

    Redaction/Implementation: Jeff Moden

    - Base 10 redaction and reduction for CTE. (Total rewrite)

    Rev 01 - 13 Mar 2010 - Jeff Moden

    - Removed one additional concatenation and one subtraction from the SUBSTRING in the

    SELECT List for that tiny bit of extra speed.

    Rev 02 - 14 Apr 2010 - Jeff Moden

    - No code changes. Added CROSS APPLY usage example to the header, some additional credits,

    and extra documentation.

    Rev 03 - 18 Apr 2010 - Jeff Moden

    - No code changes. Added notes 7, 8, and 9 about certain "optimizations" that dont'

    actually work for this type of function.

    Rev 04 - 29 Jun 2010 - Jeff Moden

    - Added WITH SCHEMABINDING thanks to a note by Paul White. This prevents an unnecessary

    "Table Spool" when the function is used in an UPDATE statement even though the function

    makes no external references.

    ***************************************************************************************************/

    --===== Define I/O parameters

    (

    @pString VARCHAR(7999),

    @pDelimiter CHAR(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table” produces values up to

    -- 10,000... enough to cover VARCHAR(8000)

    WITH

    E1(N) AS ( --=== Create Ten 1's

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 --10

    ),

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

    cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT N)) FROM E4)

    --===== Do the split

    SELECT ROW_NUMBER() OVER (ORDER BY N) AS ItemNumber,

    SUBSTRING(@pString, N, CHARINDEX(@pDelimiter, @pString + @pDelimiter, N) - N) AS Item

    FROM cteTally

    WHERE N < LEN(@pString) + 2

    AND SUBSTRING(@pDelimiter + @pString, N, 1) = @pDelimiter

    ;

    GO

    --assuming the file is on the C:\ drive and the SQL account has access to it.

    CREATE TABLE BULKWORDS(RAWDATA VARCHAR (8000))

    BULK INSERT BULKWORDS FROM 'c:\sp_find.sql'

    WITH (

    DATAFILETYPE = 'char',

    --tilde assumed to not exist in the file, so we get only one column

    FIELDTERMINATOR = '~',

    ROWTERMINATOR = '\n',

    FIRSTROW = 1

    )

    --preview the raw data?

    select * from BULKWORDS

    --create a destination table

    CREATE TABLE WordCounter(TheWord varchar(255) )

    --get the data

    INSERT INTO WordCounter

    select

    TheFunction.Item

    FROM BULKWORDS

    cross apply dbo.DelimitedSplit8K(RAWDATA,' ') TheFunction

    ----finally

    --look at the words

    SELECT TheWord, COUNT(TheWord) as TheCount

    FROM WordCounter

    --ignore whitespace that was captured

    WHERE LTRIM(TheWord) <> ''

    GROUP BY TheWord

    ORDER BY TheCount DESC

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • That's spot on.

    Thanks,

    Rich

    Hope this helps,
    Rich

    [p]
    [/p]

  • Nicely done, Lowell. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply