Bitwise Project
I was recently asked to look at a project to record and
profile answers from a multiple-choice questionnaire. The idea was that the questionnaire creator would be able to
specify their ideal respondent profile and that the system would be able to
provide a list of best match respondents. There could potentially be many thousands of respondents. A cursory glance suggested that the project was simple but as
the details of the requirements were revealed it was quickly realised that the
project was going to be more involved than first thought.
Questionnaire Rules
Although the questions were multiple choice a typical question
would expect multiple answers.For
example, the question could be "Which of the following languages do you speak
(tick all that apply)"?
The business rules were that the questionnaire creator must be
able to specify which answers the respondent
- Must select.
- Must not select.
- Should select.
In addition the questionnaire creator must be able to attach
scores to the "Could possibly select" answers to tune the responses. The analyser of the questionnaire also wanted to be
able to downgrade respondents that had provided too many answers, i.e. were
indicating that they were over-qualified. No question would have over 16 answers although it was unusual
for a question to have over10 possible answers.
First design thoughts
The initial thought was to create a completely normalised
structure with a record per answer in the database.
Choosing the "Must Selects" was easy because it was a straight
INNER JOIN.
Choosing the "Must Not Selects" was easy because it was
effectively a LEFT JOIN.
The problem came with the "Should Selects" because of the need
to identify which ones had been answered where it was desirable and which ones
had been answered where they weren’t.
Obviously these also required outer join queries.
Add into the mix the fact that each questionnaire (of which there would be many) could have
potentially 150 questions, probably 1,000 answers and potentially 60,000
respondents and clearly the volumes of data were going to pose a challenge.
Then add the requirement to send the data to remote locations
via internet connections.
Bitwise Operators
At one time programmers were expected to be fully conversant
with bitwise operators. In this day and age it isn’t nearly as important as it was, but it
is still a useful skill to have.
| Operator |
SQL Operator |
Description |
| OR |
| |
Where either or both bits equals 1 then answer is 1. |
| NOT |
~ |
The inverse of the bit i.e. 1 becomes 0 and 0 become 1. |
| AND |
& |
Where both bits are 1 the answer will be 1. |
| XOR |
^ |
When either but not both arguments are 1 the answer will be 1, otherwise zero. |
One old trick with bitwise operators was to use the XOR
statement to swap two integers without using a 3rd memory location.
An example of this is shown below
DECLARE @Arg1 Int, @Arg2 Int
SET @Arg1 = 5
SET @Arg2 = 7
SET @Arg1 = @Arg1 ^ @Arg2
SET @Arg2 = @Arg2 ^ @Arg1
SET @Arg1 = @Arg1 ^ @Arg2
SELECT @Arg1,@Arg2 |
Applying bits to the problem
Suppose that we stored a response to a question as a single
number and used the bits to indicate a yes/no answer for the possible answers. Let us go back to our "What languages do you speak?" question. By using bits we could score them as follows
| Language |
Bit |
Value |
| English |
1 |
1 |
| French |
2 |
2 |
| German |
3 |
4 |
| Italian |
4 |
8 |
| Bengali |
5 |
16 |
| Farsi |
6 |
32 |
| Chinese |
7 |
64 |
| Welsh |
8 |
128 |
If someone responded saying that they spoke English and Welsh
then their answer would be recorded as 1+ 128 = 129.
So we have a single integer for our answer and 3 integers to
store our ideal profiles. "Must have",
"Must not have" and "Should have".
Against the response an additional integer could hold the
eventual score for the response.
This reduces the amount of data that is stored dramatically
thereby making data transmission much more efficient.
Downside
OK, we’ve cut out about 80% of our storage requirements; the
problem is that the normalised version allowed a more or less complete SQL solution.
Whether or not this IS a problem is open to
interpretation. Just because you can do
the entire thing in SQL doesn’t mean that you should do the entire thing in SQL.
However, storing our responses in bits means that now need SQL
and a front end application to score the results.
Bitwise logic
Selecting the initial records
If someone must have responded in a certain way or must not
have answered in a certain way then this is a black and white response. If
either condition provides an unsatisfactory answer then the whole respondent
can be rejected.
The comparison we would use would be
If ("Profile" AND "Must Have") ="Profile" Then accept it.
If ("Profile" AND "Must Not Have") = 0 Then accept it.
For accepted records we also need to remove the "must have" score.
Or in SQL
SELECT B.*,Score = B.Answer – A.MustHave
FROM dbo.Tbl_Profile AS A
INNER JOIN dbo.Response AS B
ON A.QuestionnaireId = B.QuestionnaireId
AND A.QuestionID= B.QuestionId
WHERE (A.MustNotHave & B.Answer) = 0
AND (A.MustHave & B.Answer) = A.MustHave |
Extracting the "Should Select" options
Having brought back the records that satisfy our compulsory
requirements we now have to separate out the answers that match our "Should
have" profile, and those answers that are in excess of our "Should have"
profile.
Step 1 is to "OR" our results together as in the table below.
| |
Welsh |
Chinese |
Farsi |
Bengali |
Italian |
German |
French |
English |
Value |
| Profile |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
205 |
| Answer |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
85 |
| OR |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
221 |
We wanted someone who could speak Welsh, Chinese, Italian,
German and English
We are evaluating someone who speaks Chinese, Bengali, German
and English.
To find out which elements of the profile were satisfied we take our profile AND the Answer as shown below.
| |
Welsh |
Chinese |
Farsi |
Bengali |
Italian |
German |
French |
English |
Value |
| Profile |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
205 |
| Answer |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
85 |
| AND |
0 |
1 |
0 |
0 |
0 |
1 |
0 |
1 |
69 |
To find out which elements of the profile were not satisfied
we must take our OR result and XOR it with the actual answer as shown below.
| |
Welsh |
Chinese |
Farsi |
Bengali |
Italian |
German |
French |
English |
Value |
| OR |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
221 |
| Answer |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
1 |
85 |
| XOR |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
136 |
Et voila we can see that our candidate doesn’t speak Welsh or
Italian.
Conversely, to find out which answers our candidate has in
excess of requirements we would take our OR result and XOR it with our profile
as shown below
| |
Welsh |
Chinese |
Farsi |
Bengali |
Italian |
German |
French |
English |
Value |
| OR |
1 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
221 |
| Profile |
1 |
1 |
0 |
0 |
1 |
1 |
0 |
1 |
205 |
| XOR |
0 |
0 |
0 |
1 |
0 |
0 |
0 |
0 |
16 |
And here we can see that our candidate speaks Bengali, which
we did not require.
Translating this into SQL
In the real application I did the bitwise operations in the
application rather than within SQL mainly because there were other bitwise
operations that I needed to perform that have no direct SQL
equivalents. I also wanted to keep all major operations compartmentalised.
It should be noted that bitwise operations are usually VERY fast.
As far as I am aware all CPU's have these functions in their ALU (Arithmentic and Logic Unit) therefore
any compiled language should implement bitwise operations efficiently.
I could have requested my two values in the SELECT statement
of my original query as follows
SELECT A.*,
Score = B.Answer – A.MustHave
ExtraProfile= (B.Answer |A.MustHave ) ^ B.Answer ,
ExtraAnswer= (B.Answer | A.MustHave) ^ A.Answer
FROM dbo.Tbl_Profile AS A
INNER JOIN dbo.Response AS B
ON A.QuestionnaireId = B.QuestionnaireId
AND A.QuestionID= B.QuestionId
WHERE (A.MustNotHave & B.Answer) = 0
AND (A.MustHave & B.Answer) = A.MustHave |
The final bitwise piece
The final piece of the puzzle was to translate the two scores
back into individual bit positions corresponding to our answers.
This was done in the front end application because the action
to do this involves a bitwise shift. These numbers were then posted into a stored procedure as parameters to
calculate and store the weighted score for the particular answer.
The weighting was done in two stages.
Firstly, when the questionnaire creator specified the questions and answers the answers table
they filled in a weighting column to allow specific answers to be up-weighted or downgraded. As shown in the table below
| Language |
Weight |
| English |
2 |
| French |
0 |
| German |
1 |
| Italian |
1 |
| Bengali |
-1 |
| Farsi |
-1 |
| Chinese |
3 |
| Welsh |
1 |
Here we can see that English and Chinese were the most highly prized skills,
where as the ability to speak the indic languages actually detracted from the respondents score.
Secondly the questionnaire creator could set a threshold for the number of answers allowed for a given question.
This means that although there are 5 languages we prize, if someone speaks all 5 this may detract from their
score because they could be considered over qualified.
On the whole the respondents score would be calculated by: -
- Adding the score produced by their full response.
- Subtracting the score for the profile elements that do not appear in the response.
- Subtracting a point for every answer over a threshold amount.
End game
Once the responses were stored it became a simple matter to
list the respondent details highest score first.
A refinement that was made at a later stage was the ability to
score a respondents answers against a new questionnaire to use as a predictive
tool.
For example, the "what languages do you speak?" question could
be incorporated into another questionnaire, and the original response used to predict respondents
behaviour in hypothetical scenarios.
Overall the project provided an interesting challenge with an opportunity to
resurrect an old skill.