Handling Aggregations on a Poorly Designed Database

  • Comments posted to this topic are about the item Handling Aggregations on a Poorly Designed Database

    Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
    https://twitter.com/avikoleum
    https://www.linkedin.com/in/aveekd22/

  • I think you could also just use a CTE such as:

     

    ;WITH CTE AS
    (
    SELECT
     us.Name ,us.TeamID
     ,SUM(wl.HoursWorked) HoursWorked,
     ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY Name) RN
    FROM Users us
    INNER JOIN WorkLog wl ON wl.UserKey = us.UserKey
    GROUP BY us.Name ,us.TeamID
    )
    SELECT Name,HoursWorked
    from CTE
    WHERE RN = 1
  • How's about this instead?!:

    SELECT
    us.Name
    ,SUM(wl.HoursWorked) / COUNT(DISTINCT us.TeamID) AS HoursWorked
    FROM Users us
    INNER JOIN WorkLog wl ON wl.UserKey = us.UserKey
    GROUP BY us.Name

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The original design should have imposed a unique constraint on WorkLog.UserKey. Removing such a unique constraint (if done, in order to accommodate new design requirements) can have far reaching and unexpected consequences (beyond this view) - such as SQL that has been embedded in client application code.

    The following assumes only 1 set of duplicate user hours will be inserted by the new design (i.e., WorkLog will remain designed as-is - there will be no splitting of a user's work hours between teams, and the same user will not work the same number of hours in different WorkLog entries that are being summed):

    SELECT
    us.Name
    ,SUM(distinct wl.HoursWorked) HoursWorked
    FROM Users us
    INNER JOIN WorkLog wl ON wl.UserKey = us.UserKey
    GROUP BY us.Name

    If the real tables and their uses are more complex, consider aggregating via the windowing function OVER(PARTITION BY ...), where "..." can be what I am imagining to be a column that is named (or serving the same purpose as a) [day] - using this can eliminate some of my assumptions.

    I did not compare execution plans or statistics io.

    With respect to the original design, there is also a possibility to sp_rename the original tables, create new base tables (with new names) that are normalized per the new design requirements, import the original tables' data into the new tables, and create update-able views that are named identically to the original tables (and reference the new tables). While that possibility is more work, it allows a new design to be more formally, publicly, and robustly declared in SQL, and yet still allow legacy applications to address what they believe to be the tables (but are now views). As a bonus, those who follow you (years later) will see your intent, formally declared :). There can be concerns with this approach (such as table or index maintenance activities), but usually "a rose by any other name smells just as sweet".

    • This reply was modified 4 years, 2 months ago by  SoHelpMeCodd.
    • This reply was modified 4 years, 2 months ago by  SoHelpMeCodd.
    • This reply was modified 4 years, 2 months ago by  SoHelpMeCodd.
    • This reply was modified 4 years, 2 months ago by  SoHelpMeCodd.
  • It seems likely that reporting by team would be a feature of the application.  Otherwise, why have teams?   The data structure cannot support this without modification.  I guess my question is, at what point do you just say 'Your application data structure requires modification in order to handle NEW requirements.'?  Instead of band-aiding unsustainable solutions into an already poor design.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply