Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««12345»»»

Why is data integrity important? Expand / Collapse
Author
Message
Posted Monday, August 30, 2010 12:56 PM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, November 11, 2011 1:21 PM
Points: 50, Visits: 162
Now that we appear to be including DW in the mix, I have a question for y'all.

Suppose I have a database in which I am storing the answers that are given during the various exams that a students might take for a course. The design of the exam is such that it is anticipated that no student will complete the exam (thus, there are NULLs expected for some number of answers to questions for each student for each exam). The scoring on the exam is such that 1 point is added for correct answers, 3 points are subtracted for wrong answers and any _un_answered_ (ie. skipped) questions result in a 1 point subtraction. (Thus, the test is designed to try to suppress pure guesses but to also encourage students to answer all questions before skipping them.)

How do you differentuiate between the NULL of a skipped question (i.e. "No Answer Given") vs the NULL of the question that has not yet been reached (i.e. the "Not Applicable")?

This has occassionally come up in my experience and I have developed sevral diffent qork-arounds. I am just seeking advise here.


Ralph D. Wilson II
Development DBA

"Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
A. Lincoln
Post #977529
Posted Monday, August 30, 2010 12:58 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 2:48 PM
Points: 1,754, Visits: 4,966
alen teplitsky (8/30/2010)
i've seen this no NULL's allowed OCD thing being taken to extremes

say you sign up a customer today and promise some service that will be done in the near future but you're not sure of the date. since the column is not null you have to put in a crazy date like 1/1/2049. or if something was done in the past then 1/1/1901

so if a customer calls in a few days the CSR will happily tell them they can expect to see someone come to their site in 2049

In a scenario like this, when there can be multiple reasons for a date to be NULL, I typically allow the column to be NULL, but I also have a column indicating the status of the record. For example, a sales order shipment can have a status of 'Pending' or 'Cancelled', which would explain the non-existing schedule date. I will also have a check constraint, enforcing not-NULL conditionally for specific status values. That way the application or report builders don't have to make assumptions based on the absence of information.
Post #977530
Posted Monday, August 30, 2010 12:58 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: Administrators
Last Login: Today @ 6:07 PM
Points: 31,284, Visits: 15,744
Not sure I understand? Wouldn't a skipped question have a -1?

For the answer, is a blank because it was skipped different than a blank because they didn't know the answer?







Follow me on Twitter: @way0utwest

Forum Etiquette: How to post data/code on a forum to get the best help
Post #977531
Posted Monday, August 30, 2010 1:00 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, November 20, 2014 11:54 AM
Points: 819, Visits: 2,116
I don't understand either. How do you differentiate between a skipped an unanswered? Is it all questions above the last answered?


Post #977534
Posted Monday, August 30, 2010 1:09 PM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, November 11, 2011 1:21 PM
Points: 50, Visits: 162
A "skipped" question is one that has not been answered but has a Question Number that is lower than the highest Question number that has been answered.

An "Unanswered Question" is a question with a higher Question Number than the highest Question Number that has an answwer.

Of course, you may have already thought of the question that I came up with when presented with this problem, "How do I know whether Question N+1, where N is the highest Question Number with an answer, is an unanswered vs a skipped question?"

The Skipped Questions will have a NULL because there is no answer marked (and, therefore, no value to record). They will not have a value of -1 because a) they have no answer and b) there is nothing to preclude their being True/False, Multiple Choice (with alpha choices) or even essay questions.


Ralph D. Wilson II
Development DBA

"Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
A. Lincoln
Post #977540
Posted Monday, August 30, 2010 1:14 PM


UDP Broadcaster

UDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP BroadcasterUDP Broadcaster

Group: General Forum Members
Last Login: Friday, November 21, 2014 3:49 PM
Points: 1,469, Visits: 8,351
Steve Jones - Editor (8/30/2010)
Not sure I understand? Wouldn't a skipped question have a -1?

For the answer, is a blank because it was skipped different than a blank because they didn't know the answer?


What if it was skipped because they did not know the answer?

As Shakespeare once said: "That which we call a rose by any other name would smell as sweet."




Alvin Ramard
Memphis PASS Chapter

All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.
Post #977542
Posted Monday, August 30, 2010 2:27 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, September 19, 2013 2:00 PM
Points: 183, Visits: 479
I don't remember who wrote this comment, but it's the most profound one I've ever come across regarding data quality. I'll paraphrase because I don't remember the exact quote:

"The best quality information in a company is who owes them money. Everything else goes downhill very rapidly."

Post #977578
Posted Monday, August 30, 2010 2:36 PM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Friday, November 11, 2011 1:21 PM
Points: 50, Visits: 162
One argument that can hardly be touched : no matter how you deal with it in the db, you'll have to do work in the application to handle cases... or waste time training people so that 2049 is not the date we are planning to visit to do the work.


And, if you choose a date in the past, you have to preclude it coming up as a "Late Service Call while, if you choose one in the future you risk it becoming a "reasonable" date (e.g. if you choose '2049-01-01' and it's some time in December of 2048, that date might look reasonable).

It just seems like nothing is as easy as it should be, right? After all, in about 1976, some dang fool professor was chastising me and about 243 other students in an SMU Programming Languages course because we were wasting our time because "In another 10 or 15 years, there will be no need for programmers."

In fact, as I remember it, he claimed that computers would be able to interpet verbalized needs in order to create their own tables, indexes, etc.


Ralph D. Wilson II
Development DBA

"Give me 6 hours to chop down a tree and I will spend the first 4 sharpening the ax."
A. Lincoln
Post #977583
Posted Tuesday, August 31, 2010 7:33 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 7:42 AM
Points: 3,688, Visits: 72,435
Eric Russell 13013 (8/30/2010)
Data integrity and constraint begins with choosing the appropriate data type. Obviously (to most of us) integers should be contained in an Integer data type and monetary values in a decimal with 2 decimal places. However, what I see happen all the time, even in major ISV applications, are date/time values contained in a varchar instead of a proper date data type. Not only does the reporting process take a performance hit with the data conversion, but inevitably there will be the occasional Feb 30. Even worse is when different users or applications use a different coding scheme and you have to sort out (in a where clause at runtime) the context of '3/11/09 6:22'.


Amen,

I've seen so many of these in older databases where I am... we work to squish them... but in many cases the business won't prioritize developer time to do so.

Even better is when you find tables that violate 1NF. And have something like a column called groups with values like 'ADMIN;USER;MANAGER' in it.

Come on... :'(




--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link
For tips on how to post your problems
Post #977970
Posted Tuesday, August 31, 2010 7:42 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 7:42 AM
Points: 3,688, Visits: 72,435
RalphWilson (8/30/2010)
A "skipped" question is one that has not been answered but has a Question Number that is lower than the highest Question number that has been answered.



Why not just store the results as the questions are asked?

Table_Answers(
Answerid int identity(1,1) PRIMARY KEY,
Testid int NOT NULL,
questionid int NOT NULL,
TestAnswer int,
TestScore int NOT NULL)

As the questions are asked and answered/skipped just store either
1 in testscore if it's correct
-3 in testscore if it's wrong
-1 in testscore if it's skipped

Then left join Table_Answers to Table_questions if you care about the unanswered questions.

If Testanswer can be more than multiple choices throw in a varchar(max) or nvarchar(max) column for the free form answers... but those will be harder to grade via SQL anyway.




--Mark Tassin
MCITP - SQL Server DBA
Proud member of the Anti-RBAR alliance.
For help with Performance click this link
For tips on how to post your problems
Post #977986
« Prev Topic | Next Topic »

Add to briefcase «««12345»»»

Permissions Expand / Collapse