Convert from cursor to set-oriented approach

  • I am converting some denormalized data to a normalized structure.  In the process I've run into a situation where multiple values exist in a single column (separated by spaces) and I need to convert that single column into multiple rows of data.  Now multiply that problem by 50 columns by 50k source rows.

    The solution I came up with requires cursors..  *waits for the boos to die down* ..  I'd love to simplify this to not require cursors at all, and many people have suggested in other threads that cursors are the devil and similar sentiments.

    Here's your chance!  Below is a test script that illustrates what I'm trying to do along with the user-function + cursor solution and proper output.  It works just fine with cursors and as this is a one-time conversion the performance is plenty fast enough.  Still, I'm always up for learning a better way, so feel free to suggest improvements, and thanks for any input!

    CREATE FUNCTION split_multi_column(@id varchar(16), @list varchar(8000), @delimiter varchar(1) = ',')

      RETURNS @tbl TABLE(

        my_id varchar(16),

        my_str varchar(80)) AS BEGIN

    /*

    split_multi_column

    Author: JT Lovell

    Based on functions written by Erland Sommarskog, SQL Server MVP from http://www.sommarskog.se/arrays-in-sql.html

    */

    DECLARE

      @pos      int,

      @textpos  int,

      @chunklen smallint,

      @tmpstr   varchar(8000),

      @leftover varchar(8000),

      @tmpval   varchar(8000)

    SET @textpos = 1

    SET @leftover = ''

    WHILE @textpos <= datalength(@list) BEGIN

      SET @chunklen = 4000 - datalength(@leftover)

      SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen)

      SET @textpos = @textpos + @chunklen

      SET @pos = charindex(@delimiter, @tmpstr)

      WHILE @pos > 0 BEGIN

        SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1)))

        INSERT @tbl(my_id, my_str) VALUES(@id, @tmpval)

        SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr))

        SET @pos = charindex(@delimiter, @tmpstr)

      END

      SET @leftover = @tmpstr

    END

    INSERT @tbl(my_id, my_str) VALUES (@id, ltrim(rtrim(@leftover)))

    RETURN

    END

    GO

    CREATE TABLE #source(

      source_id varchar(16),

      source_text varchar(80)

    )

    INSERT INTO #source(source_id,source_text)

    VALUES('1','Alpha Bravo Charlie Delta Echo')

    INSERT INTO #source(source_id,source_text)

    VALUES('2','Fox Golf Hotel India Juliet')

    CREATE TABLE #target(

      target_id integer IDENTITY(1,1) NOT NULL,

      source_id varchar(16),

      target_type varchar(20),

      target_text varchar(80)

    )

    DECLARE @pk varchar(16)

    DECLARE @txt varchar(80)

    /* normalize source data */

    DECLARE cur_row CURSOR FAST_FORWARD

      FOR

        SELECT s.source_id, s.source_text

        FROM #source s

    OPEN cur_row

    FETCH NEXT FROM cur_row INTO @pk, @txt

    /* loop through each source row */

    WHILE (@@FETCH_STATUS=0) BEGIN

      INSERT INTO #target(source_id, target_type, target_text)

      SELECT smc.my_id, 'Test', smc.my_str

      FROM split_multi_column(@pk, @txt, ' ') smc

      /* get next source row */

       FETCH NEXT FROM cur_row INTO @pk, @txt

    END /* cursor loop */

    /* close and deallocate cursor */

    CLOSE cur_row

    DEALLOCATE cur_row

    /* this output is exactly what I am looking for */

    SELECT * from #target

    /*

    output results:

    target_id   source_id        target_type          target_text                                                                     

    ----------- ---------------- -------------------- --------------------------------------------------------------------------------

    1           1                Test                 Alpha

    2           1                Test                 Bravo

    3           1                Test                 Charlie

    4           1                Test                 Delta

    5           1                Test                 Echo

    6           2                Test                 Fox

    7           2                Test                 Golf

    8           2                Test                 Hotel

    9           2                Test                 India

    10          2                Test                 Juliet

    */

  • Hi,

    You can accomplish the loop without cursors:-

    ------------------------------------------------------

    CREATE TABLE #source(

      source_id varchar(16),

      source_text varchar(80)

    )

    INSERT INTO #source(source_id,source_text)

    VALUES('1','Alpha Bravo Charlie Delta Echo')

    INSERT INTO #source(source_id,source_text)

    VALUES('2','Fox Golf Hotel India Juliet')

    CREATE TABLE #target(

      target_id integer IDENTITY(1,1) NOT NULL,

      source_id varchar(16),

      target_type varchar(20),

      target_text varchar(80)

    )

    DECLARE @txt varchar(80)

    DECLARE @MinPK varchar(16)

    DECLARE @MaxPK varchar(16)

    SET @MinPK = (SELECT MIN(s.source_id) FROM #source s)

    SET @MaxPK = (SELECT MAX(s.source_id) FROM #source s)

    /* loop through each source row */

    WHILE @MinPK <= @MaxPK

    BEGIN

      SET @txt = (SELECT s.source_text FROM #source s WHERE s.source_id = @MinPK)

      INSERT INTO #target(source_id, target_type, target_text)

      SELECT smc.my_id, 'Test', smc.my_str

      FROM split_multi_column(@MinPK, @txt, ' ') smc

      /* get next source row */

      SET @MinPK = (SELECT MIN(s.source_id) FROM #source s WHERE s.source_id > @MinPK)

    END /* loop */

    /* this output is exactly what I am looking for */

    SELECT * from #target

    drop table #source

    drop table #target

    ------------------------------------------------------

    Gives exactly the same result.

    The pros and cons of this approach would have to be explained by someone on a higher plane than me though

    Have fun

     

    Steve

    We need men who can dream of things that never were.

  • If you use a user defined function in this way then I believe that SQL will use an implicit cursor because the function has to work on a row by row basis anyway.

  • Any other non-cursor method seems to be an academic exercise. And looping around a select statement is a poor substitute for a cursor, though it may not use as many resources, I wouldn't expect it to perform very well.

    Here's a general idea of how to do it with a set based operation if the values in the combined text fields are from a known, limited set.

    You could do INSERTs with values from a JOIN using LIKE:

    INSERT INTO newtable SELECT ...,'KNOWN',... JOIN ... ON a.combined_value like '%KNOWN%';

    INSERT INTO newtable SELECT ...,'KNOWN2',... JOIN ... ON a.combined_value like '%KNOWN2%';

    INSERT INTO newtable SELECT ...,'KNOWN3',... JOIN ... ON a.combined_value like '%KNOWN3%';

  • The looping around the a select statement works quite well because the locks are only held for a short period of time.

  • Thanks for the replies!  I have a few questions though:

    Steve>The while loop versus a cursor operates in nearly the same way as the cursor.  Is it really more efficient or just an alternate approach?

    Peter> I take a similar approach for the columns that have a known set.  The ones I'm doing via cursors could be anything though.

    Thanks again!

  • Unfortunately, I didn't really answer the original question - it is just to give you an alternative approach.

    I am fairly sure they are very similar in terms of efficiency in this case.

    There are other cases where you can really improve your efficiency by reducing the number of selects / database calls within the loop. Depends on circumstances.... and - as David says, the Lock time is very minimal.

    Peter is closest to answering the 'set based approach question', just a pity he didn't code the answer fully...

    Have fun

    Steve

    We need men who can dream of things that never were.

  • Sorry. I would have but it seems that it was not pertinent to the original post (see JT's last post). Also, I'm neck deep in mapping a government housing inspection business process into the finite-state machine of a particular CRM system. My brain is turning to mud.

  • JT

    if looping is performing an expensive select, cursors are a great alternative. locking can exists while you walkthrough the cursor but if it allows you to take one pass only to the source table then things could get pretty interesting because those are the rare cases when the cursors could outperform the looping.

    HTH

     

     


    * Noel

  • Noel,

    That's refreshing to hear!  I think too many people throw out the baby with the bath water when it comes to cursors.  They have their place as long as they aren't being used where set-based approaches are better. 

    Do cursors still lock the source rows when using a fast_forward (read-only, forward-only) cursor?  I was under the impression that it didn't lock unless the FOR UPDATE keyword is used.  I may be mistaken though.

     

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply