Removing Redundant Records from Table

  • I hope my question posed to you fits the forums format. If not a apologize in advance

    We are using SQL server 2008 R2 version 10.50.1765.0.

    I have a table containing 4 fields. Level1, Level2, Level2 and Level4. The other columns are there for explanatory reasons.

    Line#Level1Level2Level3Level4Goal

    1102111X

    2102766

    3102766102767X

    4105668

    5105668105670

    6105668105670105671 X

    7105668

    8105668106390

    9105668106390106391 X

    10102511

    11102511102513

    12102511102513102525

    13102511102513102525103535X

    14102511

    15102511102513

    16102511102513102525

    17102511102513102525103566X

    I would like to exclude the redundant records from this table.

    Line 1 is not redundant since the Level1 item number is unique to the table

    Line2 is redundant since the Level1 item can also found on line 3.

    Line 4 and Line 5 are redundant since Level1 item on Line 4 and Level 1 item and Level 2 item on Line 5 can also be found on Line 6.

    The lines marked with an X in the Goal column are the records I would like to extract.

    Any help would be greatly appreciated.

    Regards

    Marc

  • Ugly but it works:

    DROP TABLE #Sample

    CREATE TABLE #Sample ([Line#] int, Level1 int, Level2 int, Level3 int, Level4 int, Goal CHAR(1))

    INSERT INTO #Sample ([Line#], Level1, Level2, Level3, Level4, Goal)

    SELECT 1, 102111,NULL,NULL,NULL, 'X' UNION ALL

    SELECT 2, 102766,NULL,NULL,NULL, NULL UNION ALL

    SELECT 3, 102766,102767, NULL,NULL, 'X' UNION ALL

    SELECT 4, 105668,NULL,NULL,NULL, NULL UNION ALL

    SELECT 5, 105668,105670, NULL,NULL, NULL UNION ALL

    SELECT 6, 105668,105670, 105671, NULL, 'X' UNION ALL

    SELECT 7, 105668,NULL,NULL,NULL, NULL UNION ALL

    SELECT 8, 105668,106390, NULL,NULL, NULL UNION ALL

    SELECT 9, 105668,106390, 106391, NULL, 'X' UNION ALL

    SELECT 10, 102511,NULL,NULL,NULL, NULL UNION ALL

    SELECT 11, 102511,102513, NULL,NULL, NULL UNION ALL

    SELECT 12, 102511,102513, 102525, NULL, NULL UNION ALL

    SELECT 13, 102511,102513, 102525, 103535, 'X' UNION ALL

    SELECT 14, 102511,NULL,NULL,NULL, NULL UNION ALL

    SELECT 15, 102511,102513, NULL,NULL, NULL UNION ALL

    SELECT 16, 102511,102513, 102525,NULL, NULL UNION ALL

    SELECT 17, 102511,102513, 102525, 103566, 'X'

    SELECT s.*

    FROM #Sample s

    WHERE (0 = 1)

    OR s.Level1 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM #Sample WHERE Level1 = s.level1 AND Level2 IS NOT NULL)

    OR s.Level2 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM #Sample WHERE Level2 = s.level2 AND Level3 IS NOT NULL)

    OR s.Level3 IS NOT NULL AND NOT EXISTS (SELECT 1 FROM #Sample WHERE Level3 = s.level3 AND Level4 IS NOT NULL)

    OR s.Level4 IS NOT NULL

    ORDER BY s.[Line#]

    Hang around a while, someone will come up with something far more elegant.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Question, are level 1 values unique to level 1, and are level 2 values unique to level 2, etc?

  • Hello Lynn,

    Yes, level1 item numbers are unique to level1 and so on

    Regards

    Marc

  • Chris,

    Thank you for the post.

    Marc

  • What I'm even more interested in is that the data appears to be a "hierarchical path" structure. It also appears to be incorrect. Lines 7 and 14 should not exist because those identical entries already exist on lines 4 and 10.

    Was this information created from an "Adjaceny List"?

    Also, it appears that you're trying to "trap" only "leaf" nodes in a tree. If you ARE creating this list from an "Adjacency List", I can show you a lot better ways to represent his data, quickly identify "leaf" nodes, and much, much more.

    Lemme know...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply