Add Column to Staging table

  • Hi,

    The code below used to work fine.
    It is part of a stored procedure which bulk inserts data from a csv file.
    Suddenly, after editing, I keep on getting an error stating that RowNr does not exist.
    Any ideas?

    Cheers,
    Julian
    Netherlands


    /*##############################################
      ###                                               
      ### FIND DUPLICATE ROWS AND DELETE   
      ### THESE ROWS EXEPT ONE      
      ###              
      ############################################## */

      -- ADD COLUMN TO STORE ROW NUMBER
      --ALTER TABLE Bewoners_STAGING
      --ADD RowNr INT;
      
      SET @MYSQL = 'ALTER TABLE Bewoners_STAGING ADD RowNr INT;'
      EXEC (@MYSQL);

      -- ADD NUMBER OF ROWS TO FIELD RowNr
      UPDATE Bewoners_STAGING
      SET RowNr =  R.RowNr
              FROM(
              SELECT ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id]) AS RowNr,
                  zzpclient_id
              FROM [Bewoners_STAGING]) R
      WHERE R.zzpclient_id = Bewoners_STAGING.zzpclient_id
      
      -- DELETE ROWS WHERE ROW NUMBER > 1
      DELETE FROM Bewoners_STAGING WHERE RowNr > 1

  • SQL must parse the code before it runs.  At parse time, the column "RowNr" indeed does not exist because the code that adds the column to the table hasn't run yet.

    You could use dynamic SQL to run the UPDATE as well.  I believe that would work.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • You're making this much more complicated than it needs to be.  You're 1) adding a column, 2) updating the column, 3) deleting rows based on the column values, and presumably 4) dropping the column.  You only need one step: delete duplicates.

    WITH dupes_cte AS
    (
        SELECT ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id]) AS RowNr,
            zzpclient_id
        FROM [Bewoners_STAGING]
    )

    DELETE dupes_cte
    WHERE RowNr > 1

    Also notice that your RowNr definition is  non-deterministic.  This may cause issues down the line.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Drew and Scott,
    Thank you.
    Cheers,
    Julian


    USE xxx;
    GO
    WITH MYCTE AS
    (
    SELECT  COUNT (*) OVER (PARTITION BY zzpclient_id) AS [Rows],
            ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id], [startdate] desc) AS [RowNr],
            *
    FROM    CLIENTENOVERZICHT
    )

    SELECT * FROM MYCTE
    WHERE RowNr = 1
    ORDER BY ROWS DESC, RowNr ASC

  • drew.allen - Wednesday, November 29, 2017 1:31 PM

    You're making this much more complicated than it needs to be.  You're 1) adding a column, 2) updating the column, 3) deleting rows based on the column values, and presumably 4) dropping the column.  You only need one step: delete duplicates.

    WITH dupes_cte AS
    (
        SELECT ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id]) AS RowNr,
            zzpclient_id
        FROM [Bewoners_STAGING]
    )

    DELETE dupes_cte
    WHERE RowNr > 1

    Also notice that your RowNr definition is  non-deterministic.  This may cause issues down the line.

    Drew
    --------------------------------------------
    USE XXX;
    GO
    WITH MYCTE AS
    (
    SELECT 
            --COUNT (*) OVER (PARTITION BY zzpclient_id) AS [Rows],
            ROW_NUMBER () OVER (PARTITION BY zzpclient_id ORDER BY [zzpclient_id], [startdate] desc) AS [RowNrS],
            *
    FROM    testBewonersSTAGING
    )

    DELETE FROM MYCTE
    WHERE     RowNrS > 1

    Hi Drew,
    Now I see what you mean. Great!
    Q: why does the delete statement "do its work" on the Staging table?
    Cheers,
    Julian

  • JJR333 - Thursday, November 30, 2017 2:01 PM

    Hi Drew,
    Now I see what you mean. Great!
    Q: why does the delete statement "do its work" on the Staging table?
    Cheers,
    Julian

    This follows the rules of "updatable views" as either a view or cte don't have any data stores and simply reference a table (or group of tables), any DML will affect those underlying tables.

    Also, ROW_NUMBER is deterministic when the PARTITION and ORDER clauses refer to unique rows and not duplicates.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thank you Luis!

  • Deletes are pretty expensive and they're fully logged even in the SIMPLE recovery model.  Why not just use the generated "dupe number" (row number) to process only the data you want?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff, have to look into that.
    Cheers,
    Julian

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply