Deleting duplicate records with a CTE

  • This code is modified from an example I saw.  The purpose is to delete duplicate records.

    WITH DupRecords AS
    (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY fName, EmpID, Department, City ORDER BY ID) AS RowNumber
    FROM dbo.DeleteDuplicateRecords
    )

    DELETE
    FROM DupRecords
    WHERE RowNumber > 1

     

    The table with duplicates is called dbo.DeleteDuplicateRecords.  DupRecords is a CTE which is a temporary table right?  So the CTE would hold the results of the SELECT statement along with RowNumber as the last column.  Then the DELETE statement is applied to the CTE (temporary table) instead of the actual table with the duplicates.  Why is that the case?  Shouldn't the DELETE statement be applied to the actual table itself?

    When I query the dbo.DeleteDuplicateRecords table, the duplicates were still removed anyway, so I don't understand how the DELETE statement can be applied to a temporary table, yet it still affects the actual table with the duplicates.

     

  • A CTE is essentially just a materialised view of the data set, so it points to the actual underlying objects used.

    A CTE is not a temp table by any means.

    Might be worth reading up on CTEs vs Temp Tables, lots of good resources out in the ether on their differences.

  • The trick is in the name. People read common TABLE expression when they should read common table EXPRESSION. A CTE is not a table. It's just a query. There is no data movement other than that described by the query. There is not temporary data storage created other than what would be normally needed for any query such as hashes or similar stuff. Just remember that. It's not a table. It's an expression. There is no materialization of the query into any sort of temporary storage of any kind. It's a query. Nothing more or less.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Ant-Green wrote:

    A CTE is essentially just a materialised view of the data set, so it points to the actual underlying objects used.

    This is what I suspected.  When you say a CTE is a materialized view of a data set, is that similar to how when you make a change to a view (virtual table/saved query) the underlying table is affected?

  • michael.leach2015 wrote:

    Ant-Green wrote:

    A CTE is essentially just a materialised view of the data set, so it points to the actual underlying objects used.

    This is what I suspected.  When you say a CTE is a materialized view of a data set, is that similar to how when you make a change to a view (virtual table/saved query) the underlying table is affected?

    But it's not. It's not materialized. That implies data movement. A CTE is just a query. There is no data movement apart from the definition of the query, like any other query. The underlying table is affected through whatever the query defines.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply