December 1, 2009 at 1:57 pm
Hello,
I am trying to learn a bit more about sql and best practices and I would really look to hear your opinions on this.
Let's say I have a school database containing the following tables:
classes
students
subjects
Each class contains multiple students, each student has multiple subjects.
I want to retrieve all data for a certain class (including student/subjects).
Would it be considered better to query multiple time (which is easier to handle in my code by returning datasets per table).
Or should I use one big joined query on all 3 tables and make my code handle the returned data?
Hope that makes any sense?
Maybe I should clarify this a bit:
Up until now I would query the classes table and return the class record.
Then I would query the students table for all student records with that classID.
After that I would loop through the students and query the subjects table for each subject record.
Coding-wise this pretty easy, but this can result in a lot of queries (which I assume is bad practice).
Now I could also use one big joined query on all 3 tables, but this would result in a lot of redundant data returned.
Wouldn't this impact performance as well (especially with big tables)?
Also I would need more code to "handle" this data.
Any advice is appreciated.
December 1, 2009 at 2:08 pm
Ordinarily, you'd accomplish this with joins between the tables. If you want to provide table structure / sample data (see the first article in my signature for how) and the end result you were looking to get, we'd be happy to demonstrate how to accomplish it with joins.
December 1, 2009 at 2:28 pm
Hi Seth,
Thank you for quick response.
I don't have an actualy example as this is just "research" right now.
I do know how to create the join in order to get all data.
However this would return a dataset with a lot of redundant data (if this even IS redundant data), for instance the class data (ID, name, etc.), which is identical for all records.
I was just curious how/if this affects performance.
Thanks again.
December 1, 2009 at 2:31 pm
It only returns the redundant data if you select it. If you just use SELECT *, then yes, you'll get a ton of stuff you don't need, but doing something like:
SELECT ClassName, StudentName, SubjectName
FROM ...
You only get exactly what you need. Although using SELECT * does have an impact on performance, it is still going to be massively faster than the other method you described.
The most optimal way is to select just the columns you need with the tables joined together.
December 1, 2009 at 2:42 pm
Yes I understand that.
I guess I'm explaining it the wrong way.
If you would join the students table on the class table.
The result would be several records of students each with the classID, Classname, etc. in it.
However even though you need this data (once), the classID and classname is the same for every record.
So in this case would it be better to just query the class table once for the class related data.
Then using only the ID query the students table for all the student related data.
So you would get:
Class 1 Data - Student 1 Data
....................Student 1 Data
....................Student 1 Data
Instead of:
Class 1 Data - Student 1 Data
Class 1 Data - Student 2 Data
Class 1 Data - Student 3 Data
Only then if you would want to loop through all classes and get all students.
That would take a lot of extra queries.
Hope I'm making sense here...
December 1, 2009 at 2:56 pm
Not really. You need sample data. The relationships of these 3 tables is very unclear. Are there tables linking them together like ClassStudent and ClassSubject?
The simple answer is that in almost every case, joins are the more efficient way to go, by far. The more complicated answer is that if your database design is seriously hosed or there's something else here that I'm just not seeing, then the other way you're proposing could potentially be better... but I still highly doubt it.
December 1, 2009 at 3:00 pm
Ok fair enough, thanks a lot for your time.
As soon as I start my new project, which should be pretty soon, I will post some sample data here.
I understand that makes it a lot easier.
Thanks again for your patience though.
December 1, 2009 at 3:05 pm
My pleasure. I wish I could give you a better answer, but without knowing the structure of the data/tables, any answer is potentially wrong.
December 1, 2009 at 5:53 pm
mvertommen (12/1/2009)
Hi Seth,Thank you for quick response.
I don't have an actualy example as this is just "research" right now.
I do know how to create the join in order to get all data.
However this would return a dataset with a lot of redundant data (if this even IS redundant data), for instance the class data (ID, name, etc.), which is identical for all records.
I was just curious how/if this affects performance.
Thanks again.
"It Depends" on what you're actually trying to do especially if you're trying to do it from a GUI. Returning 3 result sets may or may not be appropriate. Hard to tell from your descriptions which are also probably pretty hard for you to describe at this point in your research.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 1, 2009 at 6:29 pm
Getting one set of results does involve repeating some field values, but the overhead of this is almost always going to be much less than the overhead of multiple roundtrips to the database to get the data in a lot of little pieces. The only counter example I can think of where it would be worthwhile to avoid duplicate data is if the class record contained huge amounts of data, such as a complete copy of the textbook. This could be addressed by returning heirarchical results in XML, although now you have the overhead of the XML conversion.
Viewing 10 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply