full joins with composite keys and subqueries

  • I have a slight problem with a full-join query which is based upon sub-queries.

    I have two tables called Activity and Store_activity which are defined:

    CREATE TABLE [dbo].[Activity] (

      [Activity_ID] [int] IDENTITY (1, 1) NOT NULL ,

      [Timestamp] [timestamp] NULL ,

      [Create_Date] [datetime] NULL ,

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[Store_Activity] (

      [Store_ID] [nvarchar] (15) COLLATE Latin1_General_CI_AS NOT NULL ,

      [Activity_ID] [int] NOT NULL ,

      [Timestamp] [timestamp] NULL ,

      [Create_Date] [datetime] NULL ,

    ) ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Activity] WITH NOCHECK ADD

      CONSTRAINT [PK_Activity] PRIMARY KEY  CLUSTERED

      (

        [Activity_ID]

      ) WITH  FILLFACTOR = 90  ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Store_Activity] WITH NOCHECK ADD

      CONSTRAINT [PK_Store_Activity] PRIMARY KEY  CLUSTERED

      (

        [Store_ID],

        [Activity_ID]

      ) WITH  FILLFACTOR = 90  ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Activity] ADD

      CONSTRAINT [DF_Activity_Create_Date] DEFAULT (getdate()) FOR [Create_Date]

    GO

     CREATE  INDEX [idx_Activity_release] ON [dbo].[Activity]([Release]) WITH  FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Store_Activity] ADD

      CONSTRAINT [DF_Store_Activity_Create_Date] DEFAULT (getdate()) FOR [Create_Date]

    GO

     CREATE  INDEX [idx_Store_Activity_activity_id] ON [dbo].[Store_Activity]([Activity_ID]) WITH  FILLFACTOR = 90 ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Store_Activity] ADD

      CONSTRAINT [FK_Store_Activity_Activity] FOREIGN KEY

      (

        [Activity_ID]

      ) REFERENCES [dbo].[Activity] (

        [Activity_ID]

      ) ON DELETE CASCADE  ON UPDATE CASCADE  NOT FOR REPLICATION ,

      CONSTRAINT [FK_Store_Activity_Store] FOREIGN KEY

      (

        [Store_ID]

      ) REFERENCES [dbo].[Store] (

        [Store_ID]

      ) ON DELETE CASCADE  ON UPDATE CASCADE  NOT FOR REPLICATION

    GO

    Within the activity table I define all the activities based upon a release. So for a release called Release 5.0 I have 138 activities. I also have a table called Stores which I use to relate the activites to to form Store_activity. So, I have 542 Stores which if I do a full join, I should get 138 x 542 store_activities = 74796.

    When I try a evaluate the mismatches for all stores I get only 62634 with this query:

    select sa.store_id, sa.activity_id as store_activity_id,   a.activity_id

    from  (select * from activity where release = '5.0') as a

    full join (select store_id, activity_id from store_activity where activity_id in (select activity_id from activity where release = '5.0')) as sa

    on a.activity_id = sa.activity_id

    order by sa.store_id

    However, when I specify that I am only interested in one store (0002) SQL Server returns the mismatches aswell :

    and store_id = '0002'

    select sa.store_id, sa.activity_id as store_activity_id,   a.activity_id

    from  (select * from activity where release = '5.0') as a

    full join (select store_id, activity_id from store_activity where activity_id in (select activity_id from activity where release = '5.0')

    and store_id = '0002') as sa

    on a.activity_id = sa.activity_id

    order by sa.store_id

  • In the activity table in your post, there is no column named "release".  Please repost the correct DDL.  Lets assume that there is a column called release defined as a varchar.

    I may be misunderstanding the desired outcome.  I think what you are asking for is the set of activities for a specific store that have not yet been completed.  Is this correct?

    If so, then this query will give you the list of all activities for release 5.0, with any activities completed by store 0002.

    select sa.store_id,

           sa.activity_id as store_activity_id,  

           a.activity_id

      from TestActivity a

      left outer join TestStore_activity sa on a.activity_id = sa.activity_id and sa.store_id = '0002'

    where a.release = '5.0' 

    The list of activities for the same store that have been completed can be found by changing the left outer join to an inner join.

    Does this get you closer to what you are lookign for?

    Wayne

     

  • The problem there is that the Store_activity doesnt have a release column. This is a very confusing problem which I really do not know what to do about and I have spent a long time to debug.

    I have writed views for the sub-queries but I that didnt make much difference.

  • Sorry, I didnt include all the table for Activity. Yes there is a release column in the activity table.

    In that case what is the difference between a full outer-join and cross-join ?

    however, I see that I could have simplified the process and in that case this does work:

    SELECT SA.store_id, A.activity_id AS missing_act

      FROM Activity AS A

           left outer JOIN

           (select * from Store_Activity where store_id = '0002') AS SA

           ON A.activity_id = SA.activity_id

     WHERE A.release = '5.0'

    order by sa.store_id, missing_act

    but again I need this for all stores !??

  • SELECT SA.store_id, A.activity_id AS missing_act

      FROM Activity AS A

           left outer JOIN

           Store_Activity AS SA

           ON A.activity_id = SA.activity_id

     WHERE A.release = '5.0'

    order by sa.store_id, missing_act

    This might be closer to what you are looking for.

    Wayne

  • thats what Joe Celko had posted earlier, but doesnt work. it only works if I specify a single store which is obviously tedeous.

  • Sorry...  I did not actually take the code to Query Analyzer.  Reverse the join.  This will give you all activity for all stores, whether completed or not.  Then filter the result set dependign on what you need.

     SELECT SA.store_id, A.activity_id AS missing_act

      FROM Activity AS A

           right outer JOIN

           Store_Activity AS SA

           ON A.activity_id = SA.activity_id

     WHERE A.release = '5.0'

    order by sa.store_id, missing_act

  • no that didnt work either.

    What i mean by mismatches is that:

    I want to find all activities which are not associated to a store

    Given each release consists of a group of stores by appending to the store_activity table.

    Because this is a manual process, some people may have added stores to releases without associating all activities to a store.

    The full DDL behind the tables would be:

    CREATE TABLE [dbo].[Store_Activity] (

    [Store_ID] [nvarchar] (15) COLLATE Latin1_General_CI_AS NOT NULL ,

    [Activity_ID] [int] NOT NULL ,

    [Timestamp] [timestamp] NULL ,

    [Create_Date] [datetime] NULL ,

    [Not_Required] [bit] NOT NULL ,

    [Revised_Planned_Date] [datetime] NULL ,

    [Revised_AM_PM] [nvarchar] (10) COLLATE Latin1_General_CI_AS NULL ,

    [Revised_Start_Time] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,

    [Revised_Finish_Time] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,

    [Actual_Date] [datetime] NULL ,

    [Comments] [varchar] (255) COLLATE Latin1_General_CI_AS NULL ,

    [Fix] [int] NULL ,

    [Go_Status] [bit] NULL ,

    [Activity_Status_ID] [int] NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[Activity] (

    [Activity_ID] [int] IDENTITY (1, 1) NOT NULL ,

    [Timestamp] [timestamp] NULL ,

    [Create_Date] [datetime] NULL ,

    [Number] [int] NULL ,

    [Name] [nvarchar] (170) COLLATE Latin1_General_CI_AS NOT NULL ,

    [Description] [nvarchar] (800) COLLATE Latin1_General_CI_AS NULL ,

    [Release] [nvarchar] (50) COLLATE Latin1_General_CI_AS NOT NULL ,

    [Phase] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,

    [Default_Planned_Week] [numeric](18, 2) NOT NULL ,

    [Drop_Dead_Date] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [Critical_Path] [bit] NULL ,

    [Team] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,

    [Area] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,

    [Applicable_Stores] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [Central_Responsiblity] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [Execution_Ownership] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [Day] [nvarchar] (20) COLLATE Latin1_General_CI_AS NULL ,

    [Duration] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [Comments] [nvarchar] (800) COLLATE Latin1_General_CI_AS NULL ,

    [Update_Ownership] [nvarchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [Online_User_ID] [int] NULL ,

    [order_id] [smallint] NULL ,

    [status_report] [char] (1) COLLATE Latin1_General_CI_AS NULL ,

    [exceptions_report] [char] (1) COLLATE Latin1_General_CI_AS NULL ,

    [visuailsation_name] [varchar] (50) COLLATE Latin1_General_CI_AS NULL ,

    [visuailsation_display] [bit] NULL ,

    [visualisation_activity_id] [int] NULL

    ) ON [PRIMARY]

    GO

    The tables would look like this:

    ------------------------

    Activity table

    ------------------------

    actvity_id  release

    ----------  -------

    1             1

    2             3

    3             3

    4             3

    5             2

    6             3

    ------------------------

    Store Activity table

    ------------------------

    store_id   activity_id

    --------   -----------

    0002          2

    0002          3

    0002          1

    0003          2

    0003          3

    0003          4

    Now if I want to find out which Store are not associated with all activities within a given release I should end up with results like this@

     

    store_id    store_activity_id   activity_id

    --------    -----------------   -----------

    0002          2                   2

    0002          3                   3

    0002          NULL                4

    0002          NULL                6

    0003          2                   2

    0003          3                   3

    0003          4                   4

    0003          NULL                6

    from that i can elliminate the NULLs to be left with

    store_id    store_activity_id   activity_id

    --------    -----------------   -----------

    0002          NULL                4

    0002          NULL                6

    0003          NULL                6

    Does that make more sense ?

  • It sure does.  Thanks for your patience.  The code to retrieve the first result set is:

    SELECT X.store_id, sa2.activity_id as store_activity_id, X.activity_id

    FROM TestStore_Activity sa2 RIGHT OUTER JOIN

    (

    SELECT DISTINCT SA.Store_id, A.activity_id

      FROM TestStore_Activity SA

     CROSS JOIN TestActivity A WHERE RELEASE = 3

    ) X ON X.Store_ID = SA2.Store_id AND X.activity_id = SA2.activity_id

    The last result set is:

    SELECT X.store_id, sa2.activity_id as store_activity_id, X.activity_id

    FROM TestStore_Activity sa2 RIGHT OUTER JOIN

    (

    SELECT DISTINCT SA.Store_id, A.activity_id

      FROM TestStore_Activity SA

     CROSS JOIN TestActivity A WHERE RELEASE = 3

    ) X ON X.Store_ID = SA2.Store_id AND X.activity_id = SA2.activity_id

    WHERE sa2.activity_id IS NULL

  • Wayne Lawton you are a star thats exactly what I wanted. I had posted the same problem on sqlteam.com & sql-server-performance.com and it remains unanswered.

    I can understand most of the SQL but i dont understand the right outer join and the cross join. Can you explain them for me please or point me in the right direction if its not too much trouble.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply