• riyaziq (11/26/2014)


    Koen Verbeeck (11/26/2014)


    riyaziq (11/25/2014)


    As Lowell pointed out, you are inserting duplicate primary key values into the table which is not permitted.

    You need to add a 'Sort' component in your Data Flow task.Right click and Edit 'Sort' component and specify by checking all the fields that form your primary key. Ensure to check the option that says "Remove duplicates" at the bottom. eg: If your Primary Key is "EmployeeID", select only "EmployeeID" and check the option to remove duplicates. If your PK is a combination of two or more fields like "FirstName" and "LastName", check both and the option to remove duplicates.

    Good Luck !

    Remember that the sort component is a blocking component and will read all data into memory before producing even one single row of output.

    This is a no-go for large data sets.

    Koen Verbeeck,

    I just implemented a Data Flow task with a Sort component using a large dataset and as you pointed out it's sometimes slow. Could you tell me what's the ideal way of dealing with this situation? I am thinking of this approach: Remove primary key specification so that the table accepts duplicate values and then using a separate Execute SQL Query task remove duplicates :

    ALTER TABLE [Table Name] ADD AUTOID INT IDENTITY(1,1)

    DELETE FROM [Table Name] WHERE AUTOID NOT IN (SELECT min(autoid) FROM [Table Name]

    This will delete all rows that are not equal to the smallest identity value. In other words, all rows except the first one.

    This will also not work if there is already an identity on the table.

    The ideal scenario is that your source is a database, and then you can remove duplicates in the source query using ROW_NUMBER() OVER (PARTITION BY keycolumn ORDER BY somecolumn).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP