Blog Post

Using Views for Faster SQL Query Development

,

A discussion of how views give a more denormalized means for querying against normalized tables.

Ad-hoc reports are frequently requested at the Boston Public Schools. Generally each request is different but similar to ones previously made. To the degree that such requests are unique, there’ll always be work involved in pulling these reports together, spot checking the results, double checking counts, etc. However, when the wheel is constantly being re-invented with a particular set of SQL logic, it’s better to make life easier in the future by encapsulating and generalizing logic when possible. The mindset to have is to always ask: Is there a better way?

An example at Boston Public Schools is our student school assignment process. We do upcoming year school assignments in 4 batches during the spring [students who register earlier have a priority over students who wait until late in the spring]. The input and output assignment tables are rather normalized, which is certainly preferred over having wide tables with duplicate data in many places. However, I often need to run ad-hoc reports based on these tables, and the normalized form simply increases the number of keystrokes I need to make to get useful results.

For the first few such requests I simply bit the bullet and added the needed yet annoying extra joins to get the data I wanted. But then I decided enough was enough and created views to help with my reporting queries. This gives me a real-time more denormalized way to make query development quicker.

Here’s a somewhat realistic scenario: For each student, get their assigned program and their highest-choice waitlist assignment. The requests are generally more complex than this, but this will illustrate my point.

First, let’s create the tables and insert some dummy data:

-- Some fields excluded from how the tables actually are at BPS
CREATE TABLE student (studentNumber INT) 
CREATE TABLE program (programid INT, schoolcode INT, grade INT, program VARCHAR(20))
CREATE TABLE schoolAssignment (studentNumber INT, programid INT)
CREATE TABLE waitlistAssignment (studentNumber INT, programid INT, waitlistPlacement INT, waitlistChoice INT)
 
INSERT INTO student VALUES(422333)
INSERT INTO student VALUES(422334)
INSERT INTO student VALUES(422335)
INSERT INTO student VALUES(422338)
INSERT INTO student VALUES(422339)
 
INSERT INTO program VALUES (1, '1111', 2, 'REGULAR')
INSERT INTO program VALUES (2, '1111', 2, 'MULTILINGUAL')
INSERT INTO program VALUES (3, '1111', 2, 'SPED_INTEGRATED')
INSERT INTO program VALUES (4, '1111', 2, 'MONTESSORI')
INSERT INTO program VALUES (5, '1234', 2, 'REGULAR')
INSERT INTO program VALUES (6, '1234', 2, 'MULTILINGUAL')
INSERT INTO program VALUES (7, '1234', 2, 'SPED_INTEGRATED')
INSERT INTO program VALUES (8, '1234', 2, 'MONTESSORI')
 
INSERT INTO schoolAssignment VALUES (422333, 1)
INSERT INTO schoolAssignment VALUES (422334, 3)
INSERT INTO schoolAssignment VALUES (422335, 4)
INSERT INTO schoolAssignment VALUES (422339, 3)
 
INSERT INTO waitlistAssignment VALUES (422333, 3, 40, 1)
INSERT INTO waitlistAssignment VALUES (422333, 5, 55, 2)
INSERT INTO waitlistAssignment VALUES (422334, 4, 27, 1)
INSERT INTO waitlistAssignment VALUES (422338, 2, 3, 1)
INSERT INTO waitlistAssignment VALUES (422338, 6, 16, 2)
INSERT INTO waitlistAssignment VALUES (422338, 8, 22, 3)

Here’s the query needed to satisfy the ad-hoc report request:

SELECT
       s.studentNumber, 
       pA.schoolcode AS 'Assigned School', 
       pA.grade AS 'Assigned Grade', 
       pA.program AS 'Assigned Program', 
       pW.schoolcode AS 'Highest Choice Waitlist School',
       pW.grade AS 'Highest Choice Waitlist Grade',
       pW.program AS 'Highest Choice Waitlist Program'
FROM student s
LEFT JOIN schoolAssignment sA ON sA.studentNumber = s.studentNumber
-- Join being made just because the data is normalized.
LEFT JOIN program pA ON pA.programid = sA.programid   
LEFT JOIN waitlistAssignment wA ON wA.studentNumber = s.studentNumber  AND wA.waitListChoice = 1
 
-- Join being made just because the data is normalized.
LEFT JOIN program pW ON pW.programid = wA.programid

I consider two of the left joins to be busy-work – certainly needed but unwanted. Let’s encapsulate these joins by creating a couple of views:

CREATE VIEW schoolAssignmentInfo AS
SELECT sA.studentNumber, sA.programid, p.schoolcode, p.grade, p.program 
FROM schoolAssignment sA
INNER JOIN program p ON sA.programid = p.programid
 
GO
 
CREATE VIEW waitlistAssignmentInfo AS
SELECT wA.studentNumber, wA.programid, p.schoolcode, 
       p.grade, p.program, waitlistPlacement, waitlistChoice 
FROM waitlistAssignment wA
INNER JOIN program p ON wA.programid = p.programid

With these views, I never need to worry about adding logic to join on program – the view is doing it for me. I just reference the view and don’t worry about it anymore. I’ve saved some keystrokes:

SELECT 
       s.studentNumber, 
       sA.schoolcode AS 'Assigned School', 
       sA.grade AS 'Assigned Grade', 
       sA.program AS 'Assigned Program', 
       wA.schoolcode AS 'Highest Choice Waitlist School',  
       wA.grade AS 'Highest Choice Waitlist Grade', 
       wA.program AS 'Highest Choice Waitlist Program'
FROM student s
LEFT JOIN schoolAssignmentInfo sA ON sA.studentNumber = s.studentNumber
LEFT JOIN waitlistAssignmentInfo wA ON wA.studentNumber = s.studentNumber AND wA.waitListChoice = 1

Is the updated query significantly smaller? Only marginally in this case. But for more complex queries, having these and other views I’ve put in place certainly saves me time and keystrokes.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating