join - how to get only the first row from the left table

  • Hello,

    I started learning T-SQL 4 years ago and this is the first time I come here so...nice to meet to you 🙂 !

    Here is a simple query excecuted in AdventureWorks2017 :

    SELECT 
    SOH.[SalesOrderID],
    SOH.[DueDate],
    SOH.[SalesOrderID]
    , SOD.[OrderQty]
    , SOD.[UnitPrice]
    FROM
    [Sales].[SalesOrderHeader] SOH
    LEFT JOIN [Sales].[SalesOrderDetail] SOD
    ON SOH.[SalesOrderID] = SOD.[SalesOrderID]

    Result :

    WhatIGet

    And here is the result I want instead :

    WhatIWant

    (blank = NULL)

    I want only the first row from the left table... So no replication (I guess this will lighten data tranfert) and better lisibility of the result.

    Is this possible ?

    I have never seen that before, anywhere.

    Thank you for any help 🙂

    • This topic was modified 4 years ago by  JackIsJack.
    • This topic was modified 4 years ago by  JackIsJack.
    • This topic was modified 4 years ago by  JackIsJack. Reason: Datasource
    • This topic was modified 4 years ago by  JackIsJack. Reason: better screenshot... I hope
  • I cannot see the source data ...

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Sorry, I accidently deleted the initial query. See my initial message to understand.

    Since then, I think I found a solution, but it doesn't look very elegant.

    SELECT 
    (CASE WHEN 1 = row_number() over (partition by SOH.[SalesOrderID] order by SOH.[SalesOrderID] )
    THEN SOH.[SalesOrderID]
    end) as [SalesOrderID],
    (CASE WHEN 1 = row_number() over (partition by SOH.[SalesOrderID] order by SOH.[SalesOrderID] )
    THEN SOH.[DueDate]
    end) as [DueDate],
    SOH.[SalesOrderID] as [SalesOrderID]
    ,[OrderQty]
    ,[UnitPrice]
    FROM
    [Sales].[SalesOrderHeader] SOH
    LEFT JOIN [Sales].[SalesOrderDetail] SOD
    ON SOH.[SalesOrderID] = SOD.[SalesOrderID]
    ORDER BY
    SOH.[SalesOrderID], SOD.[SalesOrderID]

     

    • This reply was modified 4 years ago by  JackIsJack.
  • .

  • When you say "I want only the first row from the left table," can you expand on exactly what you mean by that?

    First of all, for SalesOrderId = 43670, all of the dates are equal. The concept of 'first' is therefore unclear, can you clarify?

    Second, if the left table (SalesOrderHeader) contained three rows (43668, 45679 and 43670, for example), wouldn't the 'first row' be 43668 (assuming that this is an incrementing key of some sort). Therefore the only rows in your resultset would have this as the SalesOrderId and no other rows.

    I suspect that you mean something else, but this is how I am understanding your text.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • JackIsJack wrote:

    Hello,

    I started learning T-SQL 4 years ago and this is the first time I come here so...nice to meet to you 🙂 !

    Here is a simple case : header & details + inner join :

    SELECT 
    SOH.[SalesOrderID],
    SOH.[DueDate],
    SOH.[SalesOrderID]
    , SOD.[OrderQty]
    , SOD.[UnitPrice]
    FROM
    [Sales].[SalesOrderHeader] SOH
    LEFT JOIN [Sales].[SalesOrderDetail] SOD
    ON SOH.[SalesOrderID] = SOD.[SalesOrderID]

    WhatIGet

    And here is the result I want instead :

    WhatIWant

    (blank = NULL)

    I want only the first row from the left table... So no replication (I guess this will lighten data tranfert) and better lisibility of the result.

    Is this possible ?

    I have never seen that before, anywhere.

    Thank you for any help 🙂

    Welcome.  Glad to "meet" you, as well.

    1.  When you're using a common data source for your example, you should let folks know that (this one used AdventureWorks).  If you don't have a common data source, you should provide a readily consumable one.  See the first link in my signature line how to do that.  Knowing how to do both of these things to help us help you better and more quickly.
    2. No.  This isn't the way to do data transmission because you've provided absolutely no way to maintain the order of the rows as they need to be.  If you're going to combine two table likes this, then "replicate" the left table as you first did.  A far better idea (and this is one of the big reasons why I hate XML), is to transmit the data from the two tables as the separate entities that they are and let the people on the receiving end put things back together with a join after they import the two entities separately.
    3. I will, however, agree that, sometimes, a query must serve as both the Data Layer and the Presentation Layer.  With what I said in Item #2 above still fresh on your mind and the understanding that you should never transmit data in this poor form, here's how to achieve that form.  It also only uses a single hit on ROW_NUMBER().  I don't know if that qualifies as "more elegant" or not and I've not compared performance at this point.
        USE AdventureWorks
    ;
    WITH cteEnumerate AS
    (--==== Data Layer Code
    SELECT RowNum = ROW_NUMBER () OVER (PARTITION BY hdr.SalesOrderID ORDER BY hdr.OrderDate,SalesOrderDetailID)
    ,hdr.SalesOrderID
    ,hdr.OrderDate
    ,hdr.DueDate
    ,dtl.SalesOrderDetailID
    ,dtl.UnitPrice
    ,dtl.OrderQty
    FROM Sales.SalesOrderHeader hdr
    LEFT JOIN Sales.SalesOrderDetail dtl
    ON hdr.SalesOrderID = dtl.SalesOrderID
    )--==== Presentation Layer Code
    -- Note there is no way to preserve the order once displayed.
    -- You would need a "row number" to do so, in this case but would still be better if you
    -- transmitted the two different enties separately.
    SELECT SalesOrderID = CASE WHEN RowNum = 1 THEN CONVERT(VARCHAR(10),SalesOrderID) ELSE '' END
    ,OrderDate = CASE WHEN RowNum = 1 THEN CONVERT( CHAR(10),OrderDate,23) ELSE '' END
    ,DueDate = CASE WHEN RowNum = 1 THEN CONVERT( CHAR(10),DueDate ,23) ELSE '' END
    ,UnitPrice
    ,OrderQty
    FROM cteEnumerate cte
    ORDER BY cte.SalesOrderID,cte.RowNum
    ;

    4. As a friendly suggestion that will improve your career a bit (it did mine... a lot, actually), consider adopting a reasonable and easily readable consistent formatting standard and rigidly stick to it even for forum posts (it makes for good practice).  As short as your sample code is, it looks like it was written by someone that just doesn't care.  If you write your real code in such a haphazard fashion, it can hurt your career.  If you don't write your real code in such a haphazard fashion, then why are you punishing us? 😉

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • This was removed by the editor as SPAM

  • Hello Phil

    When you say "I want only the first row from the left table," can you expand on exactly what you mean by that? 

    More precisely, when a row from the left table matches with more than 1 row in the right table, I want to see all the row from the right table but only 1 time the row from the left table. I don't want the standard behavior of "repeating the left row as many time as there is a match".

    I don't know if it's clearer...

     

  • JackIsJack wrote:

    Please consider that I am not a native english speaker ; let me know if I am not understandable and I'll do my best to clarify.

    1. Good Sir, if that is true, then my hat is off to you because I thought you did a fine job, especially for a first post.  To me, it was crystal clear what you wanted thanks to the time you spent making and posting the graphics that you did.  Well done and... welcome aboard again!
    JackIsJack wrote:

    I added the datasource name into my initial message.

    It was AdventureWorks2017 indeed.

    2. It's not my job to judge but you've impressed me again.  Well done.  It will help others to easily experiment with solutions that may end up out striping both of us and will help all of us learn a more elegant method.  As a bit of a sidebar, that's what I love about  this forum... discussions, right or wrong, are encouraged because it "gets the creative juices flowing".

    JackIsJack wrote:

    I was expecting that kind of debate you've provided absolutely no way to maintain the order of the rows as they need to be.

    Can you explain why such a need ?

    Should I fear that the order could be randomly lost/altered ?

    Ah... sorry.  I wasn't clear.

    My response there was mostly based on your comment of ...

    JackIsJack wrote:

    So no replication (I guess this will lighten data tranfert) and better lisibility of the result

    3. If the "data transfer" you speak of is just to display the end result on screen, then you are correct (unless something goes wrong, to which I cannot speak because I'm not a web or frontend Developer).

    4. If, however, you are actually using this method to transfer data to someone who will ultimately be loading the data from the result into a database (whether normalized or not), then there is the possibility of parallelism and other things occurring that may not preserve the original order of the lines of the result you've created.  If that happens, then there is no method to associate the data from the populated "right table info" with the "left table info" because it is blank.

    JackIsJack wrote:

    The database engine is the best place, by far, to do Sort/Join actions : best performance (index mecanisms) and reliability.

    I don't understand why you suggest me to delegate the "hard work of joinning" outside the database perimeter, neither do I understand why sort order could be altered.

    However I am maybe missing something and I know that this is a not a standard way to transfer data from SQL to Application but I am still looking for the hard reasons to justify that duplication of rows which costs a lot of memory.

    IMHO, Yes and No.  Again, "It Depends".

    5a.  For SORT actions, it's frequently better to do such a thing outside of the database.  For example, if a thousand people are demanding report data from the database to support reports, returning unsorted data to the presentation app and letting it sort the way it needs to (which it might do anyway even if it receives sorted data) could be a much wiser decision because SORTing is a really expensive operation and you have only one (or even a couple) of database servers while you may have many more web servers or even end user boxes to power application side sorting.  On the other hand ("It Depends"), it could indeed be better to send presorted data.  In the case of the method in your original post, it would certainly cut down on the transmission of unnecessary duplicated data but you just don't know what could happen to the sort order.  With that, I'd strongly recommend hedging on your bet by including a line number (which won't cost an extra sort) in the transmitted result.  It won't be as "thick" as providing all the duplicated data and the app may not use it but at least the app could use it to guarantee successful correct sorting of the displayed output without it actually appearing in the output.  In this case, returning the partitioned-by-SalesOrderID RowNum column and then having the app ensure the data was sorted by SalesOrderID and that RowNum would remove all possibilities of an incorrect sort in the absence of the "left table data" for most of the rows.

    5b.  As for JOINing, again it depends.  If all you intend to do is provide result data to the app, then I'll usually agree that the database is the best place to do the joins.  I'll also state that, depending on the timeliness of the data required by the app, it may be better to do all of the aggregations and certainly all the join necessary to create and store the results in a common area for the thousand people all demanding the same information for their reports.

    5c. If, however, you are transmitting the data for the ultimate purpose of someone loading it into a database (which was my original incorrect premise based on the statement you made in your original post), then it is MUCH more effective all the way around to transmit the raw data for each entity so that the ultimate consumer doesn't have to do the equivalent of "screen scraping" to separate the data into its respective entities.  Again, this is why I hate EDI, XML, and JSON because, at the consumer end, they have to go through some really expensive parsing to separate the entities and both the tag and data "bloat" makes the transmitted data many sizes larger than it actually needs to be.  Of course, the data shouldn't need to be sorted in the transmission especially since it could suffer all of the problems previously cited in my paragraph 4.  If the data must be sortable in the same order as the appearance of lines in a transmission file (for example), the something like a "row number" must be included in the transmission to guarantee against all "silent" sorting mishaps on the receiver end.

    JackIsJack wrote:

    Yes, your query looks more elegant ! Thanks you.

    Below is the execution plan comparison

    6a. This is one of the reasons why it's frequently not  good idea to do things in the "Data Layer" that should be done in the "Application Layer".  My code actually "lost" the foot race for CPU time because of the CONVERTs used to return the blanks instead of NULLs for the "missing" "left table data" and an extra sort in the ROW_NUMBER() ORDER BY that guaranteed the output order to be the same at all times no matter what may happen.

    6b. I also state that, although I rely heavily on the execution plan during development, I never use it to determine the winner between two pieces of code because even the actual execution plan contains a lot of estimates.  I've seen it return the exact opposite as opposed to what really happens during run time.    Still, my code does come up slower CPU wise because of the CONVERTs used for the blank conversions and the extra column I used in the sort order of the ROW_NUMBER(), etc.   If we remove all of that but still keep the final final sort as it was, we can see a pretty good advantage in the reduction of logical reads (which must be in memory to start with) and, even though there are a very small number of rows in the tables, some decent reduction in CPU usage, as well.

    Here's the revised code I tested with (I use GO to strategically contain what comments will show up in SQL Profiler for ease of comparison).  The run results follow that.

    --===== Clear the guns so we can see what a "First Run" would do compared to subsequent cached runs ===================
    CHECKPOINT;
    DBCC DROPCLEANBUFFERS;
    DBCC FREEPROCCACHE;
    GO
    --===== Single ROW_NUMBER() Code ======================================================================================
    WITH cteEnumerate AS
    (--==== Data Layer Code
    SELECT RowNum = ROW_NUMBER () OVER (PARTITION BY hdr.SalesOrderID ORDER BY hdr.SalesOrderID)
    ,hdr.SalesOrderID
    ,hdr.OrderDate
    ,hdr.DueDate
    ,dtl.UnitPrice
    ,dtl.OrderQty
    FROM Sales.SalesOrderHeader hdr
    LEFT JOIN Sales.SalesOrderDetail dtl
    ON hdr.SalesOrderID = dtl.SalesOrderID
    )--==== Presentation Layer Code
    -- Note there is no way to preserve the order once displayed.
    -- You would need a "row number" to do so.
    SELECT SalesOrderID = CASE WHEN RowNum = 1 THEN SalesOrderID END
    ,OrderDate = CASE WHEN RowNum = 1 THEN OrderDate END
    ,DueDate = CASE WHEN RowNum = 1 THEN DueDate END
    ,UnitPrice
    ,OrderQty
    FROM cteEnumerate cte
    ORDER BY cte.SalesOrderID,cte.RowNum
    ;
    GO 5
    --@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    GO
    --===== Clear the guns so we can see what a "First Run" would do compared to subsequent cached runs ===================
    CHECKPOINT;
    DBCC DROPCLEANBUFFERS;
    DBCC FREEPROCCACHE;
    GO
    --===== Two ROW_NUMBER() Code =========================================================================================
    SELECT
    (CASE WHEN 1 = row_number() over (partition by SOH.[SalesOrderID] order by SOH.[SalesOrderID] )
    THEN SOH.[SalesOrderID]
    end) as [SalesOrderID],
    (CASE WHEN 1 = row_number() over (partition by SOH.[SalesOrderID] order by SOH.[SalesOrderID] )
    THEN SOH.[DueDate]
    end) as [DueDate],
    SOH.[SalesOrderID] as [SalesOrderID]
    ,[OrderQty]
    ,[UnitPrice]
    FROM
    [Sales].[SalesOrderHeader] SOH
    LEFT JOIN [Sales].[SalesOrderDetail] SOD
    ON SOH.[SalesOrderID] = SOD.[SalesOrderID]
    ORDER BY
    SOH.[SalesOrderID], SOD.[SalesOrderID]
    GO 5

    JackIsJack wrote:

    Lol, you should not consider that my production code is crappy if your judgement is based on a first post in a help forum ; this is humiliating and I did my best to take screenshots, to use AdventureWorks, to be polite...but  I suppose it's a part for the welcoming/hazing And indeed, I accidently forgot to post the initial query, my bad ; sorry again.

    7. BWAAA-HAAAA!!!!  No... it certainly wasn't meant as a "welcome hazing".   There are some people that desperately deserve such a thing but I try not to get into such hazing.  If it or the person making the post are that bad, I'll usually just skip the post.  You should see some of the code people post for example code and then, when we ask them to post the code they're actually using in production, it looks just as bad or worse .  I mentioned it this time because I actually was impressed by the time you took to post what you wanted things to look like and the code seemed contrary to that effort and so I wanted to mention it to someone that might benefit from it.  Very happy to see I was wrong and thank you for taking it in a good natured manner.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I can't say enough about the wisdom that Jeff can impart to you, he is in fact my mentor having met here on ssc back in 2005.  To what you are asking for in this post, however, I would say it should be done on your presentation layer, whether iis is Reporting Services, Excel spreadsheets, a custom application.  Doing the work of the presentation layer in the database can add unnecessary complexity to the code that simply needs to identify an transmit the data to the user making the request.

    Presentation layer software may provide the end user with additional options for presenting or sorting the data that could be problematic if you are attempting to handle the presentation of data from the database.

    Just my thoughts on the subject.

    I also want to thank you for the effort you went through to provide as much as you did for this post.  You may not realize how nice this was until you try answer someone elses question find yourself try to pull hens teeth to figure out what the really want.

     

  • 🙂 Thanks you VERY MUCH Jeff for all the knowledge you shared with me 🙂

    you have only one (or even a couple) of database servers while you may have many more web servers

    I'll that in mind for later.

    Currently, I am not dealing with such a configuration.

    We have ~300 user max (not concurrent users, a total of 300 users), we don't have to sort records at application level.

    I hope I'll never meet those constraints lol.

    you are transmitting the data for the ultimate purpose of someone loading it into a database [...]

    Yes, in that case I totally agree with your concerns about preserving the separation of entities 🙂

    we can see a pretty good advantage in the reduction of logical reads (which must be in memory to start with)

    Yes ! Indeed. I'll keep that result in mind... and above all : the way you used Profiler to get that results.

    Thanks you so much again

    Best regards

    • This reply was modified 4 years ago by  JackIsJack.
  • Thank you for the kind feedback.  Helluva first post, eh? 😀

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Lynn,

    Thank you for the very kind words.  If someone with your knowledge has claimed that I'm their mentor, then I'm truly humbled.  To be honest, you're as much of a mentor to me as you claim I am to you.  I've learned a huge amount from you and, like many mentors, you're not even aware of it.  You set the bar pretty high, Mr. Pettis.

    You were definitely on my bucket list of people important to me and I was tickled to death to have that come true at the Colorado Springs SQL Saturday.  It was indeed a great pleasure to finally get to meet you and spend some time talking face-to-face with you  after more than a decade of not being able to.

    Hopefully, that won't be the last time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • An alternative is like below.  Which performs better depends on total data volume and index(es) present:

    Edit: I'm at work so I have only a limited time to review all posts, sorry if this was already posted and I missed it.

    SELECT 
    SOH.[SalesOrderID],
    SOH.[DueDate],
    SOH.[SalesOrderID]
    , SOD.[OrderQty]
    , SOD.[UnitPrice]
    FROM
    [Sales].[SalesOrderHeader] SOH
    OUTER APPLY (
    SELECT TOP (1) *
    FROM [Sales].[SalesOrderDetail] SOD
    WHERE SOH.[SalesOrderID] = SOD.[SalesOrderID]
    ORDER BY [whatever_column(s)_give_you_the_"first"_one_as_you_want_it]
    ) AS SOD

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Scott... can you post the code that would return what the OP asked for?  Thanks.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply