Too many rows returned

  • Hi,

    I'm trying to only return all customers that have had a payment in the last two years.  The problem is that my query is returning the customer multiple times for every payment received.  How do I only return the customer once if they have a payment received in the past two years:

    Customer Table: CustID, CustName
    Payment Table: DatePayment
     
    My query:
    select customer.custID, customer.CustName
    from customer join payment on
    customer.CustID = payment.CustID
    where payment.DatePayment > '2004-08-01'
    and (customer.CustID like '10%' or customer.CustID like '20%')

    My results are:

    100288 Bell Products 2004-08-03
    100288 Bell Products 2004-08-10
    100288 Bell Products 2004-08-15
     

    And I only want 1 row per customer to return.

    Any help would be greatly appreaciated!

    Isabelle

    Thanks!
    Bea Isabelle

  • it is because every row in your join meets the condition in the where clause, not just the most recent payment. You need to either use DISTINCT, or change your grouping to something like below, so you're only looking at the most recent payment from each customer.

    select c.custID, max(c.custname), max(p.DatePayment)

    from customer c

    join payment p on p.custID = c.custID

    group by c.custID

    having max(p.DatePayment) > '2004-08-01'

  • Or you could do this:

    select

        cust.custID,

        cust.CustName

    from

        dbo.customer cust

        inner join dbo.payment pay

            on (cust.CustID = pay.CustID)

    where

        pay.DatePayment > '2004-08-01'

        and (cust.CustID like '10%' or cust.CustID like '20%')

        and pay.DatePayment = (select max(p.DatePayment) from dbo.payment p where p.CustID = pay.CustID)

     

  • Or another option that should perform well if column payment.CustID is the 1st column in an index:

    SELECT c.custID, c.CustName

    FROM customer As c

    WHERE (c.CustID like '10%' or c.CustID like '20%')

    AND EXISTS (

      SELECT *

      FROM payment As p

      WHERE p.DatePayment > '2004-08-01'

      AND     p.CustID = c.CustID

    )

  • True.  But that also depends on the actual query and what is wanted.  We only see two columns in the query on-line, but what if they wanted to most recent payment.

     

  • Well, I tried it

    (and a.date_payment = (select max(a.date_payment) from dbo.appayrh

    where p.vendor_code = a.vendor_code)

    and got an error:

    Server: Msg 147, Level 16, State 2, Line 1

    An aggregate may not appear in the WHERE clause unless it is in a subquery contained in a HAVING clause or a select list, and the column being aggregated is an outer reference.

    I used a smaller version of my actual query because I have alot of other columns that are being selected.  I thought about using distinct, but because of all the other columns, I was getting weird results.

    Here is my full statement:

    select p.vendor_code, p.vendor_name, p.vendor_addr_l1, p.vendor_addr_l2, p.vendor_addr_l3, p.vendor_addr_l4, p.vendor_addr_l5,

    p.vendor_addr_l6, p.vendor_addr_l7, p.vendor_addr_l8, p.vendor_phone, p.vendor_remit_addr_l1, p.vendor_remit_addr_l2,

    p.vendor_remit_addr_l3, p.vendor_remit_addr_l4, p.vendor_remit_addr_l5,

    a.date_payment

    from dbo.pocvnms p inner join dbo.appayrh a on

    (p.vendor_code = a.vendor_code)

    where p.vendor_code not in

    (select p.vendor_code

    where p.vendor_addr_l1 is null

    and p.vendor_addr_l2 is null

    and p.vendor_addr_l3 is null

    and p.vendor_addr_l4 is null

    and p.vendor_addr_l5 is null

    and p.vendor_addr_l6 is null

    and p.vendor_addr_l7 is null

    and p.vendor_addr_l8 is null

    and p.vendor_phone is null

    and p.vendor_remit_addr_l1 is null

    and p.vendor_remit_addr_l2 is null

    and p.vendor_remit_addr_l3 is null

    and p.vendor_remit_addr_l4 is null

    and p.vendor_remit_addr_l5 is null)

    and (p.vendor_code like '10%' or p.vendor_code like '20%')

    and a.date_payment > '2004-08-01'

    and a.date_payment = (select max(a.date_payment) from dbo.appayrh

    where p.vendor_code = a.vendor_code)

     

    Isabelle

    Thanks!
    Bea Isabelle

  • You have included a.date_payment in the resultset.

    It wasn't there in your original question. Which means you wasted effort on providing you the wrong solution.

    When you put a.date_payment in the resultset, which date do you want if a customer has 2 or more payments ? The earliest date ? The most recent date ? A random date ?

     

  • Sorry, that was a typo.  I do not need the a.date_payment in the result set.  I used the max(a.payment_date) and did a group by for all the other columns and it seems to have returned the correct data.  I am in the process of checking it to see if it's valid.

    Thanks so much for your help!

    Isabelle

    Thanks!
    Bea Isabelle

  • Reformatted your code and found what I think is the problem inthe subquery at the end.  Check it out:

    select

        p.vendor_code,

        p.vendor_name,

        p.vendor_addr_l1,

        p.vendor_addr_l2,

        p.vendor_addr_l3,

        p.vendor_addr_l4,

        p.vendor_addr_l5,

        p.vendor_addr_l6,

        p.vendor_addr_l7,

        p.vendor_addr_l8,

        p.vendor_phone,

        p.vendor_remit_addr_l1,

        p.vendor_remit_addr_l2,

        p.vendor_remit_addr_l3,

        p.vendor_remit_addr_l4,

        p.vendor_remit_addr_l5,

        a.date_payment

    from

        dbo.pocvnms p inner

        inner join dbo.appayrh a

            on (p.vendor_code = a.vendor_code)

    where

        p.vendor_code not in

        (select p.vendor_code

         where p.vendor_addr_l1 is null

         and p.vendor_addr_l2 is null

         and p.vendor_addr_l3 is null

         and p.vendor_addr_l4 is null

         and p.vendor_addr_l5 is null

         and p.vendor_addr_l6 is null

         and p.vendor_addr_l7 is null

         and p.vendor_addr_l8 is null

         and p.vendor_phone is null

         and p.vendor_remit_addr_l1 is null

         and p.vendor_remit_addr_l2 is null

         and p.vendor_remit_addr_l3 is null

         and p.vendor_remit_addr_l4 is null

         and p.vendor_remit_addr_l5 is null)

         and (p.vendor_code like '10%' or p.vendor_code like '20%')

         and a.date_payment > '2004-08-01'

         and a.date_payment = ( select

                                    max(a1.date_payment)

                                from

                                    dbo.appayrh a1

                                where

                                    p.vendor_code = a1.vendor_code)

     

     

    hth

     

  • Careful with my code above, I just noticed the inner on the inner join typed twice!

  • Now I'm getting two different results

    When I run yours, I get 3,147 rows.  When I run mine:

    select p.vendor_code, p.vendor_name, p.vendor_addr_l1, p.vendor_addr_l2, p.vendor_addr_l3, p.vendor_addr_l4, p.vendor_addr_l5,

    p.vendor_addr_l6, p.vendor_addr_l7, p.vendor_addr_l8, p.vendor_phone, p.vendor_remit_addr_l1, p.vendor_remit_addr_l2,

    p.vendor_remit_addr_l3, p.vendor_remit_addr_l4, p.vendor_remit_addr_l5,

    max(a.date_payment)

    from dbo.pocvnms p inner join dbo.appayrh a on

    (p.vendor_code = a.vendor_code)

    where p.vendor_code NOT in

    (select p.vendor_code

    where p.vendor_addr_l1 is null

    and p.vendor_addr_l2 is null

    and p.vendor_addr_l3 is null

    and p.vendor_addr_l4 is null

    and p.vendor_addr_l5 is null

    and p.vendor_addr_l6 is null

    and p.vendor_addr_l7 is null

    and p.vendor_addr_l8 is null

    and p.vendor_phone is null

    and p.vendor_remit_addr_l1 is null

    and p.vendor_remit_addr_l2 is null

    and p.vendor_remit_addr_l3 is null

    and p.vendor_remit_addr_l4 is null

    and p.vendor_remit_addr_l5 is null)

    and (p.vendor_code like '10%' or p.vendor_code like '20%')

    and a.date_payment > '2004-08-01'

    group by p.vendor_code, p.vendor_name, p.vendor_addr_l1, p.vendor_addr_l2, p.vendor_addr_l3, p.vendor_addr_l4, p.vendor_addr_l5,

    p.vendor_addr_l6, p.vendor_addr_l7, p.vendor_addr_l8, p.vendor_phone, p.vendor_remit_addr_l1, p.vendor_remit_addr_l2,

    p.vendor_remit_addr_l3, p.vendor_remit_addr_l4, p.vendor_remit_addr_l5

    order by p.vendor_code

    I get 3,113 rows.  Mine is doing a max(a.date_payment) instead of another subquery the way you set it up.  Why the difference?  Sorry for these simple questions, but I'm a newbie in this area and I'm trying to get up to speed.

    Thanks!

    Isabelle

    Thanks!
    Bea Isabelle

  • I noticed at looking at the output when I ran your code and I'm seeing duplicates, which would explain the additional rows:

    100032      NID PTY LTD   18 STROKES AVENUE      2006-07-31 00:00:00

    100032      NID PTY LTD   18 STROKES AVENUE      2006-07-31 00:00:00

    100032      NID PTY LTD   18 STROKES AVENUE      2006-07-31 00:00:00

    100032      NID PTY LTD   18 STROKES AVENUE      2006-07-31 00:00:00

    Isabelle

    Thanks!
    Bea Isabelle

  • That's why the sub-query on Max() is not the correct solution. It doesn't handle customers who made 2 or more payments on the same date.

    What does this do with no FROM ?

    where p.vendor_code NOT in

    (select p.vendor_code

    where p.vendor_addr_l1 is null

  • I see.  The query:

    where p.vendor_code NOT in

    (select p.vendor_code

    where p.vendor_addr_l1 is null

    omits any customer who has all those fields NULL.  It works the same wether I put 'from dbo.pocvnms' or not, because I already specified it in the above lines.  Is that not correct?

    Isabelle

    Thanks!
    Bea Isabelle

  • >>Is that not correct?

    Nope, not correct. You've structured it as a NOT IN which creates an unnecessary sub-query.

    If you've already selected FROM the required table, and simply need to filter based on columns in that table, then just place the condition in the WHERE, without the IN.

     

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply