Query performance help needed

  • I have multiple pairs of "Offers" and each Offer is comprised of several lists of "Items".

    My pairs of Offers are defined in one table (OfferOverlaps) with about 200,000 rows, and I have another table (OfferItems) which contains the mapping of each Offer's lists of Items with about 30,000,000 rows.

    Using these two tables I need to find the rows from OfferOverlap where the two overlapping Offers defined in each row contain at least one common Item in OfferItems.

    I first tried this:

    select distinct OO.OfferID, OO.OverlappingOfferID

    from OfferOverlaps OO

    join OfferItems OI1 on OO.OfferID = OI1.OfferID

    join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID

    where OI1.ItemID = OI2.ItemID

    But the performance was not satisfactory.

    I also tried to use exists in an attempt to try to get the comparison of matching ItemID's to stop immediately after the first match was found rather than evaluating all matches (potentially over a million) between each set of OfferItems:

    select distinct OO.OfferID, OO.OverlappingOfferID

    from OfferOverlaps OO

    join OfferItems OI1 on OO.OfferID = OI1.OfferID

    join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID

    where exists (select OI1.ItemID intersect select OI2.ItemID)

    The performance was the same. In fact, the query plans are identical for both of these approaches.

    I would be very grateful for any ideas you might have about a different approach to this query problem that would yield better performance than these approaches that I've tried.

    Below is a simple example that sets up an illustration of the query problem.

    [font="Courier New"]

    --drop table OfferOverlaps

    --drop table OfferItems

    -- 200,000 rows in this table

    create table OfferOverlaps (

    OfferID int,

    OverlappingOfferID int,

    constraint cpk_OfferOverlaps primary key clustered (OfferID,OverlappingOfferID)

    )

    insert into OfferOverlaps values

    (1, 6),

    (3, 8),

    (5, 9);

    -- 30,000,000 rows in this table

    create table OfferItems (

    OfferID int,

    ListID int,

    ItemID int,

    constraint cpk_OfferItems primary key clustered (OfferID,ListID,ItemID)

    )

    insert into OfferItems values

    (1, 1, 111),

    (1, 1, 222),

    (2, 2, 222),

    (3, 1, 333),

    (3, 2, 888),

    (4, 1, 444),

    (4, 2, 555),

    (5, 1, 555),

    (6, 1, 444),

    (6, 2, 666),

    (7, 1, 777),

    (8, 1, 333),

    (8, 1, 888),

    (9, 1, 999);

    select distinct OO.OfferID, OO.OverlappingOfferID

    from OfferOverlaps OO

    join OfferItems OI1 on OO.OfferID = OI1.OfferID

    join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID

    where OI1.ItemID = OI2.ItemID

    select distinct OO.OfferID, OO.OverlappingOfferID

    from OfferOverlaps OO

    join OfferItems OI1 on OO.OfferID = OI1.OfferID

    join OfferItems OI2 on OO.OverlappingOfferID = OI2.OfferID

    where exists (select OI1.ItemID intersect select OI2.ItemID)

    [/font]

  • For larger data sets, I'd work on getting rid of that DISTINCT operator. It's going to lead to issues because of the aggregation. But, leaving everything roughly the same, I found if you just added an index like this

    CREATE INDEX ix1 on dbo.OfferOverlaps(OverlappingID);

    Performance on my machine went from around 92ms for the original structure to about 14ms with this index. Because the clustered index is included in the key of the nonclustered index, it makes this nonclustered index covering. Having the data sorted by OverlappingID eliminates the clustered index scan.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • This should help a bit:

    Add to table OfferOverlaps:

    ,

    constraint uk_OfferOverlaps unique (OverlappingOfferID,OfferID)

    and this to table OrderItems:

    CREATE INDEX IX_ItemOrder ON OfferItems(ItemID, OfferID)

    And to avoid DISTINCT you may use this approach:

    select OO.OfferID, OO.OverlappingOfferID

    from OfferOverlaps OO

    WHERE EXISTS (

    SELECT *

    FROM OfferItems OI1

    INNER JOIN OfferItems OI2 on OI2.OfferID = OO.OverlappingOfferID AND OI1.ItemID = OI2.ItemID

    WHERE OI1.OfferID = OO.OfferID

    )

    _____________
    Code for TallyGenerator

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply