How to get record based on nearest date in SQL

  • Guys! For the three customers  in Table A I want to get the amount of the nearest date from Table B, which is based on the date in Table A. Not really sure how to describe it but I hope the example and table "Expected output" will clarify what I want to do.

    Example: For the Customer 1 in Table A I want return the amount 10 000 because 2020-11-28 is closer to 2020-10-12 than what the date of 2020-01-05 is.

    Anyone who knows?

    Table A:

     

    Table B:

     

    Expected output Table C:

     

     

     

  • select a.CustNo,t.Amount
    from TableA a
    outer apply(select top 1 b.Amount
    from TableB b
    where b.CustNo = a.CustNo
    order by abs(datediff(day,a.[Date],b.[Date]))) t(Amount)
    order by a.CustNo;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Thanks, works fine!

    But.... I realized that I want to do a group by on departments below - is this possible? See additional column in Table A. I didn't manage to make it by using the group by-clause. I also tried to do a subquery of your code but then I received the message 'SQL The ORDER BY clause is invalid in subqueries.

    Any idea how to solve the same situation by ending up with a group by summary on departments?

    Table A:

    Table B:

    Expected Output Table C:

  • Try this

    select a.Dept,SUM(t.Amount) AS Amount
    from TableA a
    outer apply(select top 1 b.Amount
    from TableB b
    where b.CustNo = a.CustNo
    order by abs(datediff(day,a.[Date],b.[Date]))) t(Amount)
    group by a.Dept
    order by a.Dept;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Works perfectly, thanks! If I want to make this code work as a subquery - is it possible?

  • Just remove the 'order by'

    select t.* from (
    select a.Dept,SUM(t.Amount) AS Amount
    from TableA a
    outer apply(select top 1 b.Amount
    from TableB b
    where b.CustNo = a.CustNo
    order by abs(datediff(day,a.[Date],b.[Date]))) t(Amount)
    group by a.Dept) t;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Just to play the devil's advocate here and make you think: 🙂

    How do you want to handle if the date in Table A is right in the middle between two dates in table B, i.e. both rows in table B are equally close to the date in table A?

    The current solution just arbitrarily chooses one of them. Is this what you want to happen in that case?

    Or should the row with the lowest or the highest amount be chosen? Or both (sum of both amounts)? Or something else?

    Can table B have more than one row with the same date? If yes, how should that be handled?

  • kaj wrote:

    Just to play the devil's advocate here and make you think: 🙂

    How do you want to handle if the date in Table A is right in the middle between two dates in table B, i.e. both rows in table B are equally close to the date in table A?

    The current solution just arbitrarily chooses one of them. Is this what you want to happen in that case?

    Or should the row with the lowest or the highest amount be chosen? Or both (sum of both amounts)? Or something else?

    Can table B have more than one row with the same date? If yes, how should that be handled?

    +1000 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Here is a different approach.  It's better on some metrics, but worse on others.  I'm not sure which will play out better in the long run.

    WITH TableBRanges AS
    (
    SELECT *
    , COALESCE(DATEADD(DAY, -DATEDIFF(DAY, LAG(tb.Dt) OVER(PARTITION BY tb.CustNo ORDER BY tb.Dt), tb.Dt)/2, tb.Dt), '19000101') AS RangeStart
    , COALESCE(DATEADD(DAY, -DATEDIFF(DAY, LEAD(tb.Dt) OVER(PARTITION BY tb.CustNo ORDER BY tb.Dt), tb.Dt)/2, tb.Dt), '99991231') AS RangeEnd
    FROM #TableB AS tb
    )
    SELECT tb.CustNo, tb.Amount
    FROM #TableA AS ta
    INNER JOIN TableBRanges AS tb
    ON ta.CustNo = tb.CustNo
    AND ta.Dt >= tb.RangeStart
    AND ta.Dt < tb.RangeEnd

    The main difference is that the midpoints are calculated before the join.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I am on an old Chrome book wthout any SQL on it for testing, but could we use MIN( DATEDIFF(DAY, ..) OVER OVER(PARTITION BY tb.CustNo  ..) ? People forget the OVER() works with  all the aggregate functions.

    Please post DDL and follow ANSI/ISO standards when asking for help. 

  • In the beginning - I wanted to get a understanding of the basic principle, but the more I work with it I face a significant problem.

    Example: For the Customer 1 in Table A I want return the amount 10 000 because 2020-11-28 is closer to 2020-10-12 than what the date of 2020-01-05 is.

    For the customer nr 2 though (which basically has raised my concern), is that it has two hits in Table A. In this case, I want to start from the one which has the latest purchase date (column PurDate), which for customer nr 2 is 2018-09-18, and then return the amount 8 000 from Table B.

    Someone of the experts out there who could help me with this?

    Table A

    Table B

    Expected Output Table C

     

  • This was removed by the editor as SPAM

  • I'm still interested in the question the Kaj asked above.  Basically, what do you want to do if the there's a tie of dates at one end or the other and what if there's a tie across all 3 dates.  Which row to select from amongst the ties?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • OK, I will try to make this clear. The dates in my tables have timestamps including milliseconds which makes it almost impossible that one date in Table A would get right in the middle of two dates in Table B. But if this was suppose to happen, I want to pick the amount from the latest date of the possible ones in Table B. If the latest date has exactly the same date, it won’t matter which I got returned because this will have the same Amount.

    Not really sure what you mean with “across all three dates”, hope I explained above.

    Worth to mention is that I basically want to use the latest purchase date from table A to start from, the possible other ones for the same customer is irrelevant.

    Let me know if you have some further questions that you want me to clarify. 

     

  • pelusch wrote:

    OK, I will try to make this clear. The dates in my tables have timestamps including milliseconds which makes it almost impossible that one date in Table A would get right in the middle of two dates in Table B. But if this was suppose to happen, I want to pick the amount from the latest date of the possible ones in Table B. If the latest date has exactly the same date, it won’t matter which I got returned because this will have the same Amount.

    Not really sure what you mean with “across all three dates”, hope I explained above.

    Worth to mention is that I basically want to use the latest purchase date from table A to start from, the possible other ones for the same customer is irrelevant.

    Let me know if you have some further questions that you want me to clarify.

    From what you say in the first paragraph above, it sounds like you're counting on a condition that's not enforced by a unique index/constraint.  It also doesn't match the data example that you presented, which uses only whole dates instead of dates and times.  That was the big reason for the question.  You can save folks some time in the future and get better answers more quickly by taking a little extra time to post "readily consumable data", which would also include the CREATE TABLE statements for your example tables.  Please see the first link in my signature line below for one of many ways to do that.

    What I meant by the 3 dates is, you have 1 date that you're using as a "base date" and you want to find the row with the nearest possibility date-wise.  That means there are two dates you need to compare to... the maximum date in the table that's less than the "base date" and the minimum date in the table that's greater than the "base date".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply