TSQL Select specific records and child records from same table

  • Hi guys. I'm fairly new to SQL, but learning quickly. I've got a situation where I need to query a list of jobs, select some that meet a specific criteria (easy) but I need to also select some other's that would be related to these (don't know how).

    Here's some sample data to illustrate:

    Source table:

    create table projects

    (

    ProjectNo nvarchar (20),

    ProjectType nvarchar(10),

    Location nvarchar (10),

    )

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc123', 'new', '111');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc124', 'std', '222');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc125', 'new', '333');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc126', 'std', '111');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc127', 'std', '111');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc128', 'std', '666');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc129', 'std', '777');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc134', 'std', '1332');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc135', 'new', '1443');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc136', 'new', '1554');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc137', 'new', '1665');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc138', 'std', '1776');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc139', 'std', '1554');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc140', 'std', '1998');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc141', 'std', '1554');

    insert into projects (ProjectNo, ProjectType, Location)

    values ('abc142', 'std', '2220');

    I need to query the rows that have "new" as the project type. Easy. But for each one of the rows with ProjectType = "new" I also need to append any row that has a matching value in the "Location" field. Call these "children". So for project abc123, at location 111 with ProjectType="new", there are 2 other records tha are "children" of this record. abc126 and abc127.

    I'd like the result of my query to look like this. Each parent retrieved, along with any related "child" records. (Order is not important, I grouped by location and I added blank lines for readability)

    ProjectProjectTypeLocation

    abc123new111

    abc126std111

    abc127std111

    abc125new333

    abc135new1443

    abc136new1554

    abc139std1554

    abc141std1554

    abc137new1665

    Can someone help me with this query?

    Thanks very much.

    Scott

  • ;WITH NewProjects

    AS

    (

    SELECT Location

    FROM projects

    WHERE ProjectType = 'New'

    )

    SELECT p.ProjectNo

    ,p.ProjectType

    ,p.Location

    FROM NewProjects AS np

    JOIN Projects AS p ON p.Location = np.Location

    ORDER BY p.Location ASC

    First I identified the projects with type New in the CTE.

    Then use that location to return information for all projects/children from the Projects table.

  • Or this:

    SELECT *

    FROM dbo.projects

    WHERE Location IN

    (

    SELECT Location

    FROM dbo.projects

    WHERE ProjectType = 'new'

    )

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank you! An alternative that may work double duty for some other things I do might be to have a 4th field that flags the children when I build the source table, call that field "ChildDef" for example.

    Can you show me how I could query the sample table and produce a resulting output where any row at a location where there is also an associated "new" ProjectType gets assigned the value of "newChild"?

    Output like this:

    ProjectProjectTypeLocationChildDef

    abc123new111

    abc124222

    abc125new333

    abc126111newChild

    abc127111newChild

    abc128666

    abc129777

    abc1341332

    abc135new1443

    abc136new1554

    abc137new1665

    abc1381776

    abc1391554newChild

    abc1401998

    abc1411554newChild

    abc1422220

  • SELECT

    p.ProjectNo,

    ProjectType = CASE WHEN p.ProjectType = 'new' AND x.ProjectType IS NOT NULL THEN 'new' ELSE '' END,

    p.Location,

    ChildDef = ISNULL(ChildDef,'')

    FROM dbo.projects p

    LEFT JOIN

    (

    SELECT *, ChildDef = CASE ProjectType WHEN 'new' THEN '' ELSE 'newChild' END

    FROM dbo.projects

    WHERE Location IN

    (

    SELECT Location

    FROM dbo.projects

    WHERE ProjectType = 'new'

    )

    ) x ON x.ProjectNo = p.ProjectNo;

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thank guys VERY MUCH. All work perfectly of course. I'm studying to dissect and comprehend the mechanics. I get it when I read it, now to get it to sink in so I can apply next time 🙂

  • I thought I had a handle on this and was able to make some mods for my real problem. But I ran into an issue I didn't anticipate originally. My "children" are only children if their finish date is close (within 2 months) of each other.

    Revised sample table:

    create table projects

    (

    ProjectNo nvarchar (20),

    ProjectType nvarchar(10),

    Location nvarchar (10),

    Finish date

    )

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc123', 'new', '111','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc124', 'std', '222','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc125', 'new', '333','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc126', 'std', '111','12/1/16');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc127', 'std', '111','11/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc128', 'std', '666','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc129', 'std', '777','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc134', 'std', '1332','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc135', 'new', '1443','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc136', 'new', '1554','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc137', 'new', '1665','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc138', 'std', '1776','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc139', 'std', '1554','12/1/16');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc140', 'std', '1998','12/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc141', 'std', '1554','11/1/15');

    insert into projects (ProjectNo, ProjectType, Location, Finish)

    values ('abc142', 'std', '2220','12/1/15');

    How can I restrict the sql above to only mark the children when their finish date is within 2 months?

    The result should be:

    ProjectNoProject TypeLocationChildDefFinish

    abc123new11112/1/2016

    abc12422212/1/2016

    abc125new33312/1/2016

    abc12611112/1/2017

    abc127111newChild11/1/2016

    abc12866612/1/2016

    abc12977712/1/2016

    abc134133212/1/2016

    abc135new144312/1/2016

    abc136new155412/1/2016

    abc137new166512/1/2016

    abc138177612/1/2016

    abc139155412/1/2017

    abc140199812/1/2016

    abc1411554newChild11/1/2016

    abc142222012/1/2016

    There are 2 projects at location 111, one is the "parent" (new) project. One of the other 2 is within 2 months finish date so it is a "child". But the other one is a year away so it is not a child.

    Hopefully that makes sense.

    I tried to figure out how to parse it but I'm stuck 🙁

  • ;WITH NewProjects

    AS

    (

    SELECT Location

    , Finish

    FROM Projects

    WHERE ProjectType = 'New'

    )

    SELECT p.ProjectNo

    , CASE p.ProjectType WHEN 'New' THEN p.ProjectType ELSE '' END AS ProjectType

    , p.Location

    , CASE WHEN (p.ProjectType <> 'New') AND (ABS(DATEDIFF(month, p.Finish, np.Finish)) <= 2) THEN 'newChild' ELSE '' END AS ChildDef

    , p.Finish

    FROM Projects AS p

    LEFT JOIN NewProjects AS np ON np.Location = p.Location

    ORDER BY p.Location ASC, p.ProjectType, p.Finish ASC

    Edit - I changed it to a LEFT JOIN so that all projects are returned in the select

  • Works, I'm digesting now. I think I was having some trouble understanding the virtual tables (my noob description) that get created in here and at what point they were "alive", I think I see now. Studying.

    Thanks very much for the help and lesson. Again.

  • I made one small change so I get "new" or "newChild" in the ChildDef field.

    How can I restrict this to just rows with one or the other of those 2 values? I gather I can't use WHERE with ChildDef, it errors with "invalid column" when I try it. I guess because it doesn't exist yet. I also tried limiting by projectType and had success with one case but not another. Which I didn't understand. See my commented out WHERE's at the end of this code:

    Is there a way to filter the output on Childef?

    WITH NewProjects

    AS (

    SELECT Location

    ,Finish

    FROM Projects

    WHERE ProjectType = 'New'

    )

    SELECT p.ProjectNo

    ,CASE p.ProjectType

    WHEN 'New'

    THEN p.ProjectType

    ELSE ''

    END AS ProjectType

    ,p.Location

    ,CASE

    WHEN (p.ProjectType <> 'New')

    AND (ABS(DATEDIFF(month, p.Finish, np.Finish)) <= 2)

    THEN 'newChild'

    WHEN p.ProjectType = 'New'

    THEN 'new'

    ELSE NULL

    END AS ChildDef

    ,p.Finish

    FROM Projects AS p

    LEFT JOIN NewProjects AS np

    ON np.Location = p.Location

    --where p.ChildDef in('new','newChild') --fails

    --where projecttype in('new') -- works

    --where projecttype = '' -- no error but no results

    ORDER BY p.Location ASC

    ,p.ProjectType

    ,p.Finish ASC

  • WHERE p.ProjectType <> 'New' AND (ABS(DATEDIFF(month, p.Finish, np.Finish)) <= 2

    or

    WHERE p.ProjectType = 'New'

    John

  • Got it. Can't use ChildDef but can just repeat the criteria that it used. I tried this to get both cases and it worked

    WHERE (

    p.ProjectType <> 'New' AND

    (ABS(DATEDIFF(month, p.Finish, np.Finish)) <= 2)

    )

    OR

    (p.ProjectType = 'New')

    Curios though, why doesn't this work?

    where p.projecttype = ''

  • What do you mean, why doesn't it work? Error message or wrong results? Do you have any projects where ProjectType is blank?

    By the way, you need to be careful with your WHERE clause. Since it includes a column from the outer table, you may find that your left join has turned back to an inner join.

    John

  • In your sample data ProjectType was either "new" or "std" there was no empty string ('').

    You are changing it to an empty string in your SELECT statement via CASE. So you cannot use it in the WHERE clause.

    Quote from this book by Itzik Ben-Gan

    The logical query processing order of the six main query clauses is

    FROM

    WHERE

    GROUP BY

    HAVING

    SELECT

    ORDER BY

    end quote.

    So the only clause that can see after SELECT is evaluated is the ORDER BY clause; even though we write the SELECT first before the WHERE.

  • In your sample data ProjectType was either "new" or "std" there was no empty string ('').

    You are changing it to an empty string in your SELECT statement via CASE. So you cannot use it in the WHERE clause.

    Got it. I understand now. Kinda the same issue I was having trying to use ChildDef, just didn't see it. Poor choice on my part for my example reusing the same field name. Thanks!

    where p.projecttype='std' gave the result I expected.

    Also, thanks for the order.

    By the way, you need to be careful with your WHERE clause. Since it includes a column from the outer table, you may find that your left join has turned back to an inner join.

    Now I'm worried. 🙂 Is the way I have the where clause correct for this example? I'm getting the right result in this dataset, but I'm about to adapt this to run on 100k records. So maybe some case I didn't duplicate in the sample data.

    WHERE (

    p.ProjectType <> 'New' AND

    (ABS(DATEDIFF(month, p.Finish, np.Finish)) <= 2)

    )

    OR

    (p.ProjectType = 'New')

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply