How do I JOIN these records?

  • Hi, possible to give me a tip on how do I join these 2 tables?

    • TABLE_A record must have the first nearest DateTime on TABLE_B
    • Datetime record on B must be greater than or equal to A (top 1)
    • Record on B can be matched with a record on A multiple times as long  as it is a top 1

    TIA

    Capture20210401

  • OK I did a LEFT JOIN into a temp table and used CTE with ROW_NUMBER() OVER PARTITION BY taking only ROW=1

    Maybe there's a better way?

    Capture_2

     

  • You could also do it with a cross applied subquery.     Whether or not it this a "better" way usually means which runs faster.     Performance will depend on the existence of appropriate supporting indexes for the query (TableB needs an index over OrderID and DateTime, not separate indexes for each.)   It will also depend on the relative volumes between TableA  and TableB.

    Select a.OrderID, a.OrderDateTime, a.OrderStatus, b.DateTime
    from TableA a
    cross apply (select top(1) DateTime
    from TableB b
    where b.OrderID = a.OrderID
    and b.DateTime >= a.OrderDateTime
    order by DateTime) b

    One word of advice for the future.   Questions phrased in conversational language don't get answered as quickly as questions supported by data and the code that you are using.     This is because the people answering the questions are just doing it out of a sense of community,  and would rather not spend a lot of time having to read between the lines to understand your intent, then create the data so they can code and test a solution.

    Next time, don't show pictures of the data.   Instead, set up a script to create and populate the sample tables and then show the expected results.

    Similarly with your code.    When you say you did a left join and used the row_number() function in a CTE, you aren't giving any significant detail.    Why not just post the code you used for your solution, along with the data creation scripts?     Help the people you are asking to help you, and you will get a wealth of information and tested solutions back.   I strongly urge you to read this article and follow its guidelines in the future.

    https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help

    Best of luck.

     

     

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Very similar, but using an OUTER APPLY because you said you used a LEFT JOIN.  OUTER APPLY is like a "LEFT APPLY" and CROSS APPLY is like an "INNER APPLY".

    Don't worry about using "*" in the inner/B query, it is NOT a performance issue because SQL will still only pull the column(s) from the table that are referenced in the outer query: you can verify this by looking at the query plan.  Thus, I almost always use * for APPLYs.

    SELECT  A.*, B.OrderId, B.DateTime
    FROM dbo.TABLE_A A
    OUTER APPLY (
    SELECT TOP (1) B.*
    FROM dbo.TABLE_B B
    WHERE B.OrderID = A.OrderID AND
    B.DateTime >= A.DateTime
    ORDER BY B.DateTime
    ) AS B

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Good catch, Scott.   I wasn't thinking about his use of a LEFT Join.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • In the future, you might want to follow the netiquette that is been established for over 35 years in SQL forums. That means posting DDL. Because of your lack of even basic manners, we have to get keys constraints data types and do the work that you were supposed to do before posting. Right now you don't know the difference between a row and a record, the table needs to have a key, and that naming a table is important. Also, a data element should have one and only one name in the entire schema. Your two timestamps are different data elements, and therefore they need different names.

    Look at the currently available data types in SQL Server; there is no reason to use the old Sybase DATETIME data type. I also seriously doubt if you keep anything accurate to nanoseconds.

    CREATE TABLE Invoices

    (order_id CHAR(10) NOT NULL,

    order_timestamp DATETIME2(0) NOT NULL, -- learn the current data types in SQL Server

    PRIMARY KEY (order_id, order_timestamp), -- required, not an option

    order_status CHAR(12) NOT NULL

    CHECK (order_status IN ('in process', ???));

    CREATE TABLE Schedule

    (order_id CHAR(10) NOT NULL,

    schedule_timestamp DATETIME2(0) NOT NULL,

    PRIMARY KEY (order_id, schedule_timestamp));

    What you are doing is called a T-Join and versions of it were invented by Dr. Codd in his last book. This was many years ago, but everything lives on the Internet. Her approach of ordering the two sets and then finding matches should be quite workable for the problem you presented. You're going to have some problems when one set is bigger than the other, or you run out of pairings or have timeslots in the schedule that are identical according to your rules. Try the suggestions you got and see if they work.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • Thanks all! The OUTER APPLY was very helpful. Will keep in mind proper posting etiquette in the future.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply