Computed column with subquery

  • Hi,

    My question is about calculating a computed column in SSIS. Unfortunately, the column is based on another column in the same query which, at its turn, is a subquery.

    I'd prefer a generic answer, but I know some people can't answer unlesss they see the script, so here they are.

    Table X1 (scores of volunteer per week and class):

    CREATE TABLE [dbo].[tblPAScores](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [VolunteerID] [varchar](50) NOT NULL,

    [NoWeek] [int] NULL,

    [PA] [numeric](5, 2) NULL,

    [Class] [varchar](50) NULL,

    CONSTRAINT [PK_tblPAScores] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    The view on previous table:

    SELECT TOP (100) PERCENT VolunteerID, Class, NoWeek, PA,

    (SELECT COUNT(*) AS Expr1

    FROM dbo.tblPAScores

    WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1 AS PA1R,

    (SELECT COUNT(VolunteerID) AS Expr1

    FROM dbo.tblPAScores AS tblPAScores_1

    WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) AS N

    FROM dbo.tblPAScores AS X1

    ORDER BY NoWeek, VolunteerID

    • N: calculates the number of people for a week and class (column 'N')

    • PA1R: calculates how many people have higher PA score than you in that week and class

    In the next column, I'd need to divide N by PA1R, it's that simple. Why? The target is, in terms of PA, calculate what percentage of people does better than yourself (say, 30%), same as you (40%) and worse (say 30%) fior a given week and class.

    To get this, I'd need dividing N by PA1R (1/N, (PA1R-1/N), etc).

    Questions:

    1) Is it possible to have a computed column based on previous columns which are subqueries?

    2) If not, any alternative solution (pivot tables or the like)? Do I have to run a megasubquery (sort of the subquery1 / subquery 2)

    Thanks in advance, a.

  • a_ud (4/19/2013)


    Hi,

    Questions:

    1) Is it possible to have a computed column based on previous columns which are subqueries?

    Hi,

    It is possible to have computed columns in a view. If you tried directly 1/N; PA1R/N then you probably had an error message.

    create view vwPAScores

    as

    SELECT TOP (100) PERCENT VolunteerID, Class, NoWeek, PA,

    (SELECT COUNT(*) AS Expr1

    FROM dbo.tblPAScores

    WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1 AS PA1R,

    (SELECT COUNT(VolunteerID) AS Expr1

    FROM dbo.tblPAScores AS tblPAScores_1

    WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) AS N,

    1/(SELECT COUNT(VolunteerID) AS Expr1 FROM dbo.tblPAScores AS tblPAScores_1 WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) as [1_divide_N],

    (((SELECT COUNT(*) AS Expr1 FROM dbo.tblPAScores WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1)-1)/(SELECT COUNT(VolunteerID) AS Expr1 FROM dbo.tblPAScores AS tblPAScores_1 WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) as [PA1R-1_divide_N]

    FROM dbo.tblPAScores AS X1

    ORDER BY NoWeek, VolunteerID

    Igor Micev,My blog: www.igormicev.com

  • a_ud (4/19/2013)


    Hi,

    My question is about calculating a computed column in SSIS. Unfortunately, the column is based on another column in the same query which, at its turn, is a subquery.

    I'd prefer a generic answer, but I know some people can't answer unlesss they see the script, so here they are.

    Table X1 (scores of volunteer per week and class):

    CREATE TABLE [dbo].[tblPAScores](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [VolunteerID] [varchar](50) NOT NULL,

    [NoWeek] [int] NULL,

    [PA] [numeric](5, 2) NULL,

    [Class] [varchar](50) NULL,

    CONSTRAINT [PK_tblPAScores] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    The view on previous table:

    SELECT TOP (100) PERCENT VolunteerID, Class, NoWeek, PA,

    (SELECT COUNT(*) AS Expr1

    FROM dbo.tblPAScores

    WHERE (PA > X1.PA) AND (Class = X1.Class) AND (NoWeek = X1.NoWeek)) + 1 AS PA1R,

    (SELECT COUNT(VolunteerID) AS Expr1

    FROM dbo.tblPAScores AS tblPAScores_1

    WHERE (Class = X1.Class) AND (NoWeek = X1.NoWeek)) AS N

    FROM dbo.tblPAScores AS X1

    ORDER BY NoWeek, VolunteerID

    • N: calculates the number of people for a week and class (column 'N')

    • PA1R: calculates how many people have higher PA score than you in that week and class

    In the next column, I'd need to divide N by PA1R, it's that simple. Why? The target is, in terms of PA, calculate what percentage of people does better than yourself (say, 30%), same as you (40%) and worse (say 30%) fior a given week and class.

    To get this, I'd need dividing N by PA1R (1/N, (PA1R-1/N), etc).

    Questions:

    1) Is it possible to have a computed column based on previous columns which are subqueries?

    2) If not, any alternative solution (pivot tables or the like)? Do I have to run a megasubquery (sort of the subquery1 / subquery 2)

    Thanks in advance, a.

    Hi again, @a_ud,

    By posting your code rather than asking for a generic answer, you've allowed me to give you what should be a better solution. Your view definition could be

    SELECT VolunteerID, Class, NoWeek, PA,

    Y.PA1R,

    Z.N,

    ((Z.N-Y.PA1R)/(Z.N * 1.0)) * 100 AS pct

    FROM dbo.tblPAScores AS X1

    OUTER APPLY (SELECT COUNT(*) + 1 AS PA1R

    FROM dbo.tblPAScores X2

    WHERE (X2.PA > X1.PA) AND (X2.Class = X1.Class) AND (X2.NoWeek = X1.NoWeek)) Y

    OUTER APPLY (SELECT COUNT(X3.VolunteerID) AS N

    FROM dbo.tblPAScores X2

    WHERE (X2.Class = X1.Class) AND (X2.NoWeek = X1.NoWeek)) Z

    The correlated subqueries in the OUTER APPLYs generate columns for PA1R and N for each row, and you can use those columns as many different times and ways as you'd like in the SELECT columns.

    One note - you'll see that I set up the calculation of the percent with better scores with a seemingly superfluous "* 1.0". Since COUNT() returns an int, N and PA1R will both be ints. Therefore, (N-PA1R)/N will return an int, which will always be 0. Adding the "* 1.0" introduces a decimal datatype into the mix so that all the int values will be implicitly converted to decimal values before the calculation is performed, allowing you to get values in the domain 0.0 - 1.0, as expected.

    Also, you'll see that I removed the TOP(100) PERCENT . . . ORDER BY from my code. It's generally better practice to order the results in the SELECT statement that references your view. You can run into some unexpected performance problems when you write views to return ordered result sets. When the query optimizer expands the view definition into the query, the ORDER BY in the view may confound the optimizer and lead to an undesirable execution plan.

    Jason Wolfkill

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply