What can we use in SQL instead of E. F. Codd’s T theta operators for best-fit? Joe Celko returns with another puzzle that isn’t new, in fact it already features “Swedish”, “Croatian” and “Colombian” solutions in chapter 17 of Joe’s ‘SQL for Smarties’ book. These were all written before CTEs or the new WINDOW functions. Is there now a better solution? Was there one even then? We leave it to the readers to provide the answer!

In the book “The Relational Model, Version Two” (1990, ISBN13: 978-0201141924), Dr. E. F. Codd introduced a set of new theta operators, called T-operators, which were based on the idea of a best-fit or approximate equality. The name never caught on and nobody invented special syntax for them.

One problem is that they were defined by a procedural algorithm and not a set-oriented definition. It is easier to understand with an example modified from Dr. Codd.

The problem is to assign the classes to the available classrooms. We want (class_size < room_size) to be true after the assignments are made. This will allow us a few empty seats in each room for late students. We can do this in one of two ways. The first way is to sort the tables in ascending order by classroom size and the number of students in a class. We start with the following simplified tables:

CREATE TABLE Rooms

(room_nbr CHAR(2) NOT NULL PRIMARY KEY,

room_size INTEGER NOT NULL

CHECK (room_size > 0));

CREATE TABLE Classes

(class_nbr CHAR(2) NOT NULL PRIMARY KEY,

class_size INTEGER NOT NULL

CHECK (class_size > 0));

These tables have the following rows in them:

INSERT INTO Classes (class_nbr, class_size)

VALUES (‘c1’, 80),

(‘c2’, 70),

(‘c3’, 65),

(‘c4’, 55),

(‘c5’, 50),

(‘c6’, 40);

INSERT INTO Rooms (room_nbr, room_size)

VALUES (‘r1’, 70),

(‘r2’, 40),

(‘r3’, 50),

(‘r4’, 85),

(‘r5’, 30),

(‘r6’, 65),

(‘r7’, 55);

The goal of the T-JOIN problem is to assign a class which is smaller than the classroom given it (class_size < room_size). Dr. Codd gives two approaches to the problem.

### 1) Ascending Order Algorithm

Sort both tables in ascending order. Reading from the top of the Rooms table, match each class with the first room that will fit.

Classes Rooms

class_nbr class_size room_nbr room_size

==================== ===================

‘c6’ 40 ‘r5’ 30

‘c5’ 50 ‘r2’ 40

‘c4’ 55 ‘r3’ 50

‘c3’ 65 ‘r7’ 55

‘c2’ 70 ‘r6’ 65

‘c1’ 80 ‘r1’ 70

This gives us …

Results

class_nbr class_size room_nbr room_size

=======================================

‘c2’ 70 ‘r4’ 85

‘c3’ 65 ‘r1’ 70

‘c4’ 55 ‘r6’ 65

‘c5’ 50 ‘r7’ 55

‘c6’ 40 ‘r3’ 50

### 2) Descending Order Algorithm

Sort both tables in descending order. Reading from the top of the Classes table, match each class with the first room that will fit.

Classes Rooms

class_nbr class_size room_nbr room_size

=======================================

‘c1’ 80 ‘r4’ 85

‘c2’ 70 ‘r1’ 70

‘c3’ 65 ‘r6’ 65

‘c4’ 55 ‘r7’ 55

‘c5’ 50 ‘r3’ 50

‘c6’ 40 ‘r2’ 40

and …

Results

class_nbr class_size room_nbr room_size

=======================================

‘c1’ 80 ‘r4’ 85

‘c3’ 65 ‘r1’ 70

‘c4’ 55 ‘r6’ 65

‘c5’ 50 ‘r7’ 55

‘c6’ 40 ‘r3’ 50

Notice that the answers are different! Dr. Codd has never given a definition in relational algebra of the T-Join, so I proposed that we need one. Informally, for each class, we want the smallest room that will hold it, while maintaining

the T-JOIN condition. Or for each room, we want the largest class that will almost fill it, while maintaining the T-JOIN condition. These can be two different things, because these are not “best fit” solutions “first fit” approach.

Other theta conditions can be used in place of the “less than” shown here. If “less than or equal” is used, all the classes are assigned to a room in this case, but not in all cases. Nor is the solution always unique. Here is a (class_size <= room_size) answer for this data. Room ‘r5’ is too small to match any class.

Results

class_nbr class_size room_nbr room_size

========================================

‘c1’ 80 ‘r4’ 85

‘c2’ 70 ‘r1’ 70

‘c3’ 65 ‘r6’ 65

‘c4’ 55 ‘r7’ 55

‘c5’ 50 ‘r3’ 50

‘c6’ 40 ‘r2’ 40

In the third edition of SQL FOR SMARTIES, I gave “Swedish”, “Croatian” and “Colombian” solutions, named for the countries of the submitters. They were all written without CTEs or the new WINDOW functions (that is the technical name for the OVER() clause on aggregate functions). Obviously, if two rooms or two classes are the same size, they can be swapped with each other. Let’s make life easier by adding a UNIQUE constraint to **room_size** and **class_size**.

The problem is write an SQL statement that gives a T-Join.

**WARNING: **some of you already know this, but finding an optimal solution is an NP-complete problem. The bigger the data set gets, the effort it takes to solve the problem increase beyond any reasonable limit. You can Google for “Knapsack Problem” or “Bin Packing Problem” if you want the details.

The best answer to each stumper will be given a prize of a $100 Amazon voucher. The stumper will be run simultaneously on SQL Server Central and Simple-Talk. To see all the comments so far, you will need to visit both sites.

We will take entries for a week after the first Monday of publication, posted as comments to the articles.

Two weeks after the challenge is sent out, the judge’s decision and comments will be sent out in the newsletter, and published on the site. Joe Celko and Phil Factor will judge the answers to this puzzle. Your answer should :Your answer should :

- Solve the problem — Duh!
- Avoid proprietary features in SQL Server that will not port or be good across all releases, present and future.
- Use Standard features in SQL Server that will be good across all releases, present and future. Extra points for porting code.
- Be clever but not obscure.
- Explain how you got your answer.