what is the difference between these two queries

  • Please I am struggling to understand the difference between these two queries. It behaves differently as output

    google bigquery query 1

    UPDATE dde-demo-d001.sap_crm.document_flow_root_bods
    SET case_guid = ICT.case_guid
    FROM dde-demo-d001.sap_crm.document_flow_root_bods DFR
    INNER JOIN dde-demo-d001.sap_crm.inferred_case_transactions ICT
    ON DFR.transaction_header_guid = ICT.transaction_header_guid
    WHERE DFR.case_guid IS NULL;

    query 2

    UPDATE dde-demo-d001.sap_crm.document_flow_root_bods DFR
    SET case_guid = ICT.case_guid
    FROM (SELECT transaction_header_guid,case_guid FROM dde-demo-d001.sap_crm.inferred_case_transactions) ICT
    WHERE (DFR.case_guid IS NULL) and (DFR.transaction_header_guid = ICT.transaction_header_guid);

    Context : First query uses inner join and second doesnt use any joins. I cant seem to find any difference between two queries as far as logic ic concerned.

    the query with inner join gives the below error

    UPDATE/MERGE must match at most one source row for each target row

     

    where as the query 2 updates success with 5 rows.

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • The "JOIN" in the second query is in the WHERE clause as...

    and (DFR.transaction_header_guid = ICT.transaction_header_guid)

    It's also referred to as an "old style" or "Non-SANSI" "Equi-Join"

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It looks like there is more than one row in ICT with the same transaction_header_guid.

    In SQL Server you could correctly write it as this:

    UPDATE DFR
    SET DFR.case_guid = ICT.case_guid
    FROM dde_demo_d001.sap_crm.document_flow_root_bods AS DFR
    CROSS APPLY (SELECT TOP(1) ICT.case_guid
    FROM dde_demo_d001.sap_crm.inferred_case_transactions AS ICT
    WHERE DFR.transaction_header_guid = ICT.transaction_header_guid
    ORDER BY ICT.case_guid -- Ensure deterministic results if necessary
    ) AS ICT
    WHERE DFR.case_guid IS NULL;

    In BigQuery like this:

    UPDATE dde_demo_d001.sap_crm.document_flow_root_bods AS DFR
    SET DFR.case_guid = sub.case_guid
    FROM (
    -- Get exactly one "best" row per transaction_header_guid
    SELECT transaction_header_guid, case_guid
    FROM (SELECT transaction_header_guid,
    case_guid,
    ROW_NUMBER() OVER (PARTITION BY transaction_header_guid ORDER BY case_guid) AS rn
    FROM dde_demo_d001.sap_crm.inferred_case_transactions
    )
    WHERE rn = 1
    ) AS sub
    WHERE DFR.case_guid IS NULL
    AND DFR.transaction_header_guid = sub.transaction_header_guid;

     

     

  • Hey, great question — this actually tripped me up before too. The main reason you're seeing different behavior is because of how BigQuery handles updates when there's more than one match per row.

    In your first query with the INNER JOIN, if there's even one record in inferred_case_transactions that matches the same transaction_header_guid more than once, BigQuery throws that error — it doesn’t know which case_guid to use for the update. That’s what the error means when it says “must match at most one source row for each target row.” Even if you think it should be one-to-one, if there are hidden duplicates, it’ll fail.

    But in your second query, the subquery behaves differently — BigQuery doesn’t treat it as a join in the same way, so as long as it finds a single match during execution, it just works. That’s why you're seeing it update 5 rows successfully.

    You might want to check if your inferred_case_transactions table has duplicate transaction_header_guid values. If so, and you want to use the join approach, you'd need to pick just one row per match — like with ROW_NUMBER() and filtering to the first one. Let me know if you want help with that!

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply