Deleting duplicate records with a CTE

  • michael.leach2015

    SSCommitted

    Points: 1593

    This code is modified from an example I saw.  The purpose is to delete duplicate records.

    WITH DupRecords AS
    (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY fName, EmpID, Department, City ORDER BY ID) AS RowNumber
    FROM dbo.DeleteDuplicateRecords
    )

    DELETE
    FROM DupRecords
    WHERE RowNumber > 1

     

    The table with duplicates is called dbo.DeleteDuplicateRecords.  DupRecords is a CTE which is a temporary table right?  So the CTE would hold the results of the SELECT statement along with RowNumber as the last column.  Then the DELETE statement is applied to the CTE (temporary table) instead of the actual table with the duplicates.  Why is that the case?  Shouldn't the DELETE statement be applied to the actual table itself?

    When I query the dbo.DeleteDuplicateRecords table, the duplicates were still removed anyway, so I don't understand how the DELETE statement can be applied to a temporary table, yet it still affects the actual table with the duplicates.

     

  • anthony.green

    SSC Guru

    Points: 112515

    A CTE is essentially just a materialised view of the data set, so it points to the actual underlying objects used.

    A CTE is not a temp table by any means.

    Might be worth reading up on CTEs vs Temp Tables, lots of good resources out in the ether on their differences.

  • Grant Fritchey

    SSC Guru

    Points: 396703

    The trick is in the name. People read common TABLE expression when they should read common table EXPRESSION. A CTE is not a table. It's just a query. There is no data movement other than that described by the query. There is not temporary data storage created other than what would be normally needed for any query such as hashes or similar stuff. Just remember that. It's not a table. It's an expression. There is no materialization of the query into any sort of temporary storage of any kind. It's a query. Nothing more or less.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • michael.leach2015

    SSCommitted

    Points: 1593

    anthony.green wrote:

    A CTE is essentially just a materialised view of the data set, so it points to the actual underlying objects used.

    This is what I suspected.  When you say a CTE is a materialized view of a data set, is that similar to how when you make a change to a view (virtual table/saved query) the underlying table is affected?

  • Grant Fritchey

    SSC Guru

    Points: 396703

    michael.leach2015 wrote:

    anthony.green wrote:

    A CTE is essentially just a materialised view of the data set, so it points to the actual underlying objects used.

    This is what I suspected.  When you say a CTE is a materialized view of a data set, is that similar to how when you make a change to a view (virtual table/saved query) the underlying table is affected?

    But it's not. It's not materialized. That implies data movement. A CTE is just a query. There is no data movement apart from the definition of the query, like any other query. The underlying table is affected through whatever the query defines.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply