SQLServerCentral Article

Using Bits to Store Data

,

I was recently asked to look at a project to record and profile answers from a multiple-choice questionnaire. The idea was that the questionnaire creator would be able to specify their ideal respondent profile and that the system would be able to provide a list of best match respondents. There could potentially be many thousands of respondents. A cursory glance suggested that the project was simple but as the details of the requirements were revealed it was quickly realised that the project was going to be more involved than first thought.

Questionnaire Rules

Although the questions were multiple choice a typical question would expect multiple answers.For example, the question could be "Which of the following languages do you speak (tick all that apply)"?

The business rules were that the questionnaire creator must be able to specify which answers the respondent

  • Must select.
  • Must not select.
  • Should select.

In addition the questionnaire creator must be able to attach scores to the "Could possibly select" answers to tune the responses. The analyser of the questionnaire also wanted to be able to downgrade respondents that had provided too many answers, i.e. were indicating that they were over-qualified. No question would have over 16 answers although it was unusual for a question to have over10 possible answers.

First design thoughts

The initial thought was to create a completely normalised structure with a record per answer in the database.

Choosing the "Must Selects" was easy because it was a straight INNER JOIN.

Choosing the "Must Not Selects" was easy because it was effectively a LEFT JOIN.

The problem came with the "Should Selects" because of the need to identify which ones had been answered where it was desirable and which ones had been answered where they weren’t.

Obviously these also required outer join queries.

Add into the mix the fact that each questionnaire (of which there would be many) could have potentially 150 questions, probably 1,000 answers and potentially 60,000 respondents and clearly the volumes of data were going to pose a challenge.

Then add the requirement to send the data to remote locations via internet connections.

Bitwise Operators

At one time programmers were expected to be fully conversant with bitwise operators. In this day and age it isn’t nearly as important as it was, but it is still a useful skill to have.

Operator SQL Operator Description
OR | Where either or both bits equals 1 then answer is 1.
NOT ~ The inverse of the bit i.e. 1 becomes 0 and 0 become 1.
AND & Where both bits are 1 the answer will be 1.
XOR ^ When either but not both arguments are 1 the answer will be 1, otherwise zero.

One old trick with bitwise operators was to use the XOR statement to swap two integers without using a 3rd memory location.

An example of this is shown below

DECLARE @Arg1 Int, @Arg2 Int
SET @Arg1 = 5
SET @Arg2 = 7
SET @Arg1 = @Arg1 ^ @Arg2
SET @Arg2 = @Arg2 ^ @Arg1
SET @Arg1 = @Arg1 ^ @Arg2
SELECT @Arg1,@Arg2

Applying bits to the problem

Suppose that we stored a response to a question as a single number and used the bits to indicate a yes/no answer for the possible answers. Let us go back to our "What languages do you speak?" question. By using bits we could score them as follows

Language Bit Value
English 1 1
French 2 2
German 3 4
Italian 4 8
Bengali 5 16
Farsi 6 32
Chinese 7 64
Welsh 8 128

If someone responded saying that they spoke English and Welsh then their answer would be recorded as 1+ 128 = 129.

So we have a single integer for our answer and 3 integers to store our ideal profiles. "Must have", "Must not have" and "Should have".

Against the response an additional integer could hold the eventual score for the response.

This reduces the amount of data that is stored dramatically thereby making data transmission much more efficient.

Downside

OK, we’ve cut out about 80% of our storage requirements; the problem is that the normalised version allowed a more or less complete SQL solution.

Whether or not this IS a problem is open to interpretation. Just because you can do the entire thing in SQL doesn’t mean that you should do the entire thing in SQL.

However, storing our responses in bits means that now need SQL and a front end application to score the results.

Bitwise logic

Selecting the initial records

If someone must have responded in a certain way or must not have answered in a certain way then this is a black and white response. If either condition provides an unsatisfactory answer then the whole respondent can be rejected.

The comparison we would use would be

If ("Profile" AND "Must Have") ="Profile" Then accept it.

If ("Profile" AND "Must Not Have") = 0 Then accept it.

For accepted records we also need to remove the "must have" score.

Or in SQL

SELECT B.*,Score = B.Answer – A.MustHave
FROM dbo.Tbl_Profile AS A
INNER JOIN dbo.Response AS B
ON A.QuestionnaireId = B.QuestionnaireId
AND A.QuestionID= B.QuestionId
WHERE (A.MustNotHave & B.Answer) = 0
AND (A.MustHave & B.Answer) = A.MustHave

Extracting the "Should Select" options

Having brought back the records that satisfy our compulsory requirements we now have to separate out the answers that match our "Should have" profile, and those answers that are in excess of our "Should have" profile.

Step 1 is to "OR" our results together as in the table below.

Welsh Chinese Farsi Bengali Italian German French English Value
Profile 1 1 0 0 1 1 0 1 205
Answer 0 1 0 1 0 1 0 1 85
OR 1 1 0 1 1 1 0 1 221

We wanted someone who could speak Welsh, Chinese, Italian, German and English

We are evaluating someone who speaks Chinese, Bengali, German and English.

To find out which elements of the profile were satisfied we take our profile AND the Answer as shown below.

Welsh Chinese Farsi Bengali Italian German French English Value
Profile 1 1 0 0 1 1 0 1 205
Answer 0 1 0 1 0 1 0 1 85
AND 0 1 0 0 0 1 0 1 69

To find out which elements of the profile were not satisfied we must take our OR result and XOR it with the actual answer as shown below.

Welsh Chinese Farsi Bengali Italian German French English Value
OR 1 1 0 1 1 1 0 1 221
Answer 0 1 0 1 0 1 0 1 85
XOR 1 0 0 0 1 0 0 0 136

Et voila we can see that our candidate doesn’t speak Welsh or Italian.

Conversely, to find out which answers our candidate has in excess of requirements we would take our OR result and XOR it with our profile as shown below

Welsh Chinese Farsi Bengali Italian German French English Value
OR 1 1 0 1 1 1 0 1 221
Profile 1 1 0 0 1 1 0 1 205
XOR 0 0 0 1 0 0 0 0 16

And here we can see that our candidate speaks Bengali, which we did not require.

Translating this into SQL

In the real application I did the bitwise operations in the application rather than within SQL mainly because there were other bitwise operations that I needed to perform that have no direct SQL equivalents. I also wanted to keep all major operations compartmentalised.

It should be noted that bitwise operations are usually VERY fast. As far as I am aware all CPU's have these functions in their ALU (Arithmentic and Logic Unit) therefore any compiled language should implement bitwise operations efficiently.

I could have requested my two values in the SELECT statement of my original query as follows

SELECT A.*,
Score = B.Answer – A.MustHave
ExtraProfile= (B.Answer |A.MustHave ) ^ B.Answer ,
ExtraAnswer= (B.Answer | A.MustHave) ^ A.Answer
FROM dbo.Tbl_Profile AS A
INNER JOIN dbo.Response AS B
ON A.QuestionnaireId = B.QuestionnaireId
AND A.QuestionID= B.QuestionId
WHERE (A.MustNotHave & B.Answer) = 0
AND (A.MustHave & B.Answer) = A.MustHave

The final bitwise piece

The final piece of the puzzle was to translate the two scores back into individual bit positions corresponding to our answers.

This was done in the front end application because the action to do this involves a bitwise shift. These numbers were then posted into a stored procedure as parameters to calculate and store the weighted score for the particular answer.

The weighting was done in two stages.

Firstly, when the questionnaire creator specified the questions and answers the answers table they filled in a weighting column to allow specific answers to be up-weighted or downgraded. As shown in the table below

Language Weight
English 2
French 0
German 1
Italian 1
Bengali -1
Farsi -1
Chinese 3
Welsh 1

Here we can see that English and Chinese were the most highly prized skills, where as the ability to speak the indic languages actually detracted from the respondents score.

Secondly the questionnaire creator could set a threshold for the number of answers allowed for a given question. This means that although there are 5 languages we prize, if someone speaks all 5 this may detract from their score because they could be considered over qualified.

On the whole the respondents score would be calculated by: -

  • Adding the score produced by their full response.
  • Subtracting the score for the profile elements that do not appear in the response.
  • Subtracting a point for every answer over a threshold amount.

End game

Once the responses were stored it became a simple matter to list the respondent details highest score first.

A refinement that was made at a later stage was the ability to score a respondents answers against a new questionnaire to use as a predictive tool.

For example, the "what languages do you speak?" question could be incorporated into another questionnaire, and the original response used to predict respondents behaviour in hypothetical scenarios.

Overall the project provided an interesting challenge with an opportunity to resurrect an old skill.

Rate

5 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (1)

You rated this post out of 5. Change rating