SQLServerCentral Article

Using Exotic Joins in SQL Part 1

,

When most developers think of joins, they think of “a.SomethingID = b.SomethingID”. This type of join, the equijoin, is vitally important to SQL programming; however, it only scratches the surface of the power of the SQL join.

This is the first in a series of articles that will look at several different types of “exotic” joins in SQL. This article will focus on using the BETWEEN operator in joins when dealing with range-based data.

Introducing the BETWEEN Join

When dealing with things like calendars, grading scales, and other range-based data, the BETWEEN operator comes in very handy in the WHERE clause. It is often forgotten that the BETWEEN operator can also be used in join criteria.

In the WHERE clause, the BETWEEN operator is usually used to test whether some field is between two constants. However, the BETWEEN operator can take any valid SQL expression for any or all of its three arguments. This includes columns of tables.

One use of a BETWEEN join is to determine in which range a particular value falls. Joins of this nature tend to have the following pattern:

<FactData> BETWEEN <RangeMinimum> AND <RangeMaximum>

In this pattern, the “fact data” is contained in a table with instances of data such as payments, test scores, login attempts, or clock in/out events. The other table, the “range lookup table”, is usually a smaller table which provides a range minimum and maximum and other data for the various ranges.

For example, consider a scenario in which a student is enrolled in a class. A student receives a numeric grade for a class on a scale of 0 to 100. This numeric grade corresponds to a letter grade of A, B, C, D, or E. However, the school does not use the traditional grading scale in which 90 to 100 corresponds to an A, 80-89 corresponds to a B, and so forth. Instead, the school uses the following grading scale:

Letter Grade

Numeric Grade

A

92 - 100

B

84 - 91

C

76 - 83

D

68 - 75

E

0 - 68

To accommodate the school's custom grading scale, their records database has the following table defined:

CREATE TABLE tb_GradeScale(
      LetterGrade       char(1)     NOT NULL,
MinNumeric        int         NOT NULL,
      MaxNumeric        int         NOT NULL,
      IsFailing         smallint    NOT NULL,
      CONSTRAINT PK_GradeScale PRIMARY KEY(LetterGrade),
      CONSTRAINT CK_MinMax CHECK(MinNumeric <= MaxNumeric)
)

The students' numeric scores are stored in the following table:

CREATE TABLE tb_StudentGrade(
      StudentID         int         NOT NULL,
      ClassID           varchar(5)  NOT NULL,
      NumericGrade      int         NOT NULL,
CONSTRAINT PK_StudentGrade PRIMARY KEY(StudentID, ClassID),
CONSTRAINT CK_StudentGrade_NumericGrade
      CHECK(NumericGrade BETWEEN 0 AND 100)
)

In this scenario, the tb_StudentGrade table is the “fact table” and the tb_GradeScale table is the “range lookup table”. The NumericGrade field serves as “fact data” while the MinNumeric and MaxNumeric fields serve as the “range minimum” and “range maximum”. Thus, following the fact-min-max pattern, we can construct the following join criteria:

    NumericGrade BETWEEN MinNumeric AND MaxNumeric

If we put these join criteria into the context of a query which generates a report containing all the students' letter grades for English 101, we end up with the following:

SELECT
      s.StudentID,
      g.LetterGrade
FROM
      tb_StudentGrade s
INNER JOIN
      tb_GradeScale g
      ON(
            s.NumericGrade BETWEEN g.MinNumeric AND g.MaxNumeric
      )
WHERE
      ClassID = 'EH101'

In this query, we join the student grade table with the grading scale table in order to translate a numeric grade to a letter grade. In order to accomplish this, we use the BETWEEN operator to specify the relationship between the two tables being joined.

Using BETWEEN With Temporal Data

Some of the trickiest queries to write are those that deal with temporal data like calendars, appointment times, and class schedules. For example, many businesses have a fiscal calendar that they use for accounting. Accounting periods may start on the 26th of the month and end on the 25th of the following month. The company may vary the starting and ending dates of each accounting period to even out the number of days in each accounting period. In order to generate reports by accounting period, you need to define a table that lays out the fiscal calendar being used. Such a table may look like this:

CREATE TABLE tb_FiscalCalendar(
      FiscalYear        int         NOT NULL,
      AcctPeriod        int         NOT NULL,
      StartDatetime     datetime    NOT NULL,
      EndDatetime       datetime    NOT NULL,
CONSTRAINT PK_FiscalCalendar PRIMARY KEY(FiscalYear, AcctPeriod), CONSTRAINT CK_FiscalCalendar_DateCheck
CHECK(StartDatetime < EndDatetime)
)

In this table, the FiscalYear column indicates the fiscal year to which the accounting period belongs. The AcctPeriod column identifies the accounting period within the fiscal year. The StartDatetime and EndDatetime columns specify the actual starting and ending date and time of the accounting period.

Suppose you are trying to write a report as part of a customer payment processing system. This report summarizes the total number and amount of payments by accounting period. The records of the customer payments are stored in the following table:

CREATE TABLE tb_Payment(
      PaymentID         int         NOT NULL    IDENTITY(1, 1),
      AccountID         int         NOT NULL,
      PostedDatetime    datetime    NOT NULL    DEFAULT(GETDATE()),
      PaymentAmt        money       NOT NULL,
CONSTRAINT PK_Payment PRIMARY KEY(PaymentID)
)

In order to construct the query needed for the report, you must first determine the fiscal year and accounting period in which each payment occurred. You must then group by the fiscal year and accounting period, summing the PaymentAmt field and counting the number of records in each group.

To determine each payment's accounting period, you can use a BETWEEN join to the tb_FiscalCalendar table:

FROM
      tb_Payment p
INNER JOIN
      tb_FiscalCalendar c
      ON(
            p.PostedDatetime BETWEEN c.StartDatetime AND c.EndDatetime
      )

As do many other joins using the BETWEEN operator, this join follows the fact-min-max pattern seen in the grading scale example. Each payment record (of which there are many) provides a “fact” stating that a certain payment occurred at a particular date and time. The fiscal calendar table acts more as a configuration table that specifies a range of datetime values and provides configuration data about this range.

To finish off the payment reporting query, we add the grouping, aggregate functions, and an ORDER BY clause to make the output more readable:

SELECT
      c.FiscalYear,
      c.AcctPeriod,
      COUNT(*) AS PaymentCount,
      SUM(PaymentAmt) AS TotalPaymentAmt
FROM
      tb_Payment p
INNER JOIN
      tb_FiscalCalendar c
      ON(
            p.PostedDatetime BETWEEN c.StartDatetime AND c.EndDatetime
      )
GROUP BY
      c.FiscalYear,
      c.AcctPeriod
ORDER BY
      c.FiscalYear,
      c.AcctPeriod

The output yields the needed report easily and efficiently. With proper indexing, this query should run quite well even against large sets of data.

Other Uses of BETWEEN Joins

The BETWEEN join can be put to use in a number of other scenarios. Coupling the BETWEEN operator with a self-join can be a useful technique for concurrency-checking queries such as validating calendars and appointment schedules. BETWEEN joins can be used to produce histograms by aggregating ranges of data. In a situation where you must join precise data to rounded data, the BETWEEN operator can be used to perform a “fuzzy” join. Once you put the BETWEEN join in your query toolbox, you'll find even more uses for it and wonder how you ever did without it.

About the Author

Chris Cubley is an MCSD with over four years of experience designing and implementing SQL Server-based solutions in the education, healthcare, and telecommunications industries. He can be reached at ccubley@queryplan.com.

Rate

5 (2)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (2)

You rated this post out of 5. Change rating