Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Using Views for Faster SQL Query Development

A discussion of how views give a more denormalized means for querying against normalized tables.

Ad-hoc reports are frequently requested at the Boston Public Schools. Generally each request is different but similar to ones previously made. To the degree that such requests are unique, there’ll always be work involved in pulling these reports together, spot checking the results, double checking counts, etc. However, when the wheel is constantly being re-invented with a particular set of SQL logic, it’s better to make life easier in the future by encapsulating and generalizing logic when possible. The mindset to have is to always ask: Is there a better way?

An example at Boston Public Schools is our student school assignment process. We do upcoming year school assignments in 4 batches during the spring [students who register earlier have a priority over students who wait until late in the spring]. The input and output assignment tables are rather normalized, which is certainly preferred over having wide tables with duplicate data in many places. However, I often need to run ad-hoc reports based on these tables, and the normalized form simply increases the number of keystrokes I need to make to get useful results.

For the first few such requests I simply bit the bullet and added the needed yet annoying extra joins to get the data I wanted. But then I decided enough was enough and created views to help with my reporting queries. This gives me a real-time more denormalized way to make query development quicker.

Here’s a somewhat realistic scenario: For each student, get their assigned program and their highest-choice waitlist assignment. The requests are generally more complex than this, but this will illustrate my point.

First, let’s create the tables and insert some dummy data:

-- Some fields excluded from how the tables actually are at BPS
CREATE TABLE student (studentNumber INT) 
CREATE TABLE program (programid INT, schoolcode INT, grade INT, program VARCHAR(20))
CREATE TABLE schoolAssignment (studentNumber INT, programid INT)
CREATE TABLE waitlistAssignment (studentNumber INT, programid INT, waitlistPlacement INT, waitlistChoice INT)
 
INSERT INTO student VALUES(422333)
INSERT INTO student VALUES(422334)
INSERT INTO student VALUES(422335)
INSERT INTO student VALUES(422338)
INSERT INTO student VALUES(422339)
 
INSERT INTO program VALUES (1, '1111', 2, 'REGULAR')
INSERT INTO program VALUES (2, '1111', 2, 'MULTILINGUAL')
INSERT INTO program VALUES (3, '1111', 2, 'SPED_INTEGRATED')
INSERT INTO program VALUES (4, '1111', 2, 'MONTESSORI')
INSERT INTO program VALUES (5, '1234', 2, 'REGULAR')
INSERT INTO program VALUES (6, '1234', 2, 'MULTILINGUAL')
INSERT INTO program VALUES (7, '1234', 2, 'SPED_INTEGRATED')
INSERT INTO program VALUES (8, '1234', 2, 'MONTESSORI')
 
INSERT INTO schoolAssignment VALUES (422333, 1)
INSERT INTO schoolAssignment VALUES (422334, 3)
INSERT INTO schoolAssignment VALUES (422335, 4)
INSERT INTO schoolAssignment VALUES (422339, 3)
 
INSERT INTO waitlistAssignment VALUES (422333, 3, 40, 1)
INSERT INTO waitlistAssignment VALUES (422333, 5, 55, 2)
INSERT INTO waitlistAssignment VALUES (422334, 4, 27, 1)
INSERT INTO waitlistAssignment VALUES (422338, 2, 3, 1)
INSERT INTO waitlistAssignment VALUES (422338, 6, 16, 2)
INSERT INTO waitlistAssignment VALUES (422338, 8, 22, 3)

Here’s the query needed to satisfy the ad-hoc report request:

SELECT
       s.studentNumber, 
       pA.schoolcode AS 'Assigned School', 
       pA.grade AS 'Assigned Grade', 
       pA.program AS 'Assigned Program', 
       pW.schoolcode AS 'Highest Choice Waitlist School',
       pW.grade AS 'Highest Choice Waitlist Grade',
       pW.program AS 'Highest Choice Waitlist Program'
FROM student s
LEFT JOIN schoolAssignment sA ON sA.studentNumber = s.studentNumber
-- Join being made just because the data is normalized.
LEFT JOIN program pA ON pA.programid = sA.programid   
LEFT JOIN waitlistAssignment wA ON wA.studentNumber = s.studentNumber  AND wA.waitListChoice = 1
 
-- Join being made just because the data is normalized.
LEFT JOIN program pW ON pW.programid = wA.programid

I consider two of the left joins to be busy-work – certainly needed but unwanted. Let’s encapsulate these joins by creating a couple of views:

CREATE VIEW schoolAssignmentInfo AS
SELECT sA.studentNumber, sA.programid, p.schoolcode, p.grade, p.program 
FROM schoolAssignment sA
INNER JOIN program p ON sA.programid = p.programid
 
GO
 
CREATE VIEW waitlistAssignmentInfo AS
SELECT wA.studentNumber, wA.programid, p.schoolcode, 
       p.grade, p.program, waitlistPlacement, waitlistChoice 
FROM waitlistAssignment wA
INNER JOIN program p ON wA.programid = p.programid

With these views, I never need to worry about adding logic to join on program – the view is doing it for me. I just reference the view and don’t worry about it anymore. I’ve saved some keystrokes:

SELECT 
       s.studentNumber, 
       sA.schoolcode AS 'Assigned School', 
       sA.grade AS 'Assigned Grade', 
       sA.program AS 'Assigned Program', 
       wA.schoolcode AS 'Highest Choice Waitlist School',  
       wA.grade AS 'Highest Choice Waitlist Grade', 
       wA.program AS 'Highest Choice Waitlist Program'
FROM student s
LEFT JOIN schoolAssignmentInfo sA ON sA.studentNumber = s.studentNumber
LEFT JOIN waitlistAssignmentInfo wA ON wA.studentNumber = s.studentNumber AND wA.waitListChoice = 1

Is the updated query significantly smaller? Only marginally in this case. But for more complex queries, having these and other views I’ve put in place certainly saves me time and keystrokes.

Comments

Posted by manuel-248210 on 15 August 2011

As far as I know, Views are very useful for this kind of de-normalization of data in the meanings of keeping data structures clean and clear.

But how does this kind of de-normalized access works according to performance? What I mean is: what happens in an OLTP scenario where I have some very big tables (millions of rows) which I need to query quite often (with a human user involved)? Are views good for that? I know views (just like indexes) needs to be mantained and that is resources expensive, but is that a reasonable trade-off? (I see a "it depends" answer on the horizon, hehe).

Leave a Comment

Please register or log in to leave a comment.