March 7, 2008 at 2:19 am
Hi,
Firstly thanks for reading the post and any responses you may take the time to write.
I am not a developer or a student looking for help with assigments etc, I am just doing my own project on a DB and want some advice on what tools to use and what I need to learn to get to the end point (no doubt with a lot of pain inbetween), so I apologise in advance for my ignorance. I ahve spent a long time calidating data to put into a db and it is fairly obvious Access wont cut it so I need to use something bigger and better, hence looking at SQL, DB2 or Postgresql.
I am aware that anything I want to achieve is possible however there are so many options it feels like it would take a lifetime to examine all the possibilities and the best solutions. So hence this request to point me in the right direction and cut out a lot of needless research and confusion.
The DB by most peoples standards on here will be small beer it will have 20 - 30 tables the biggest having something in the region of 30 columns a lot of the tables will only have 10 or so columns. The size is only about 2Gb and is likely to grow by 500 or so records a day. Given the nature of the beast I am not restricted to the normal commercial rules of normalization etc and I am looking to make the data as visible as possible so I am looking to add results into the rows (or find a tool that will do it for me), having read a little I know this will have some shaking their heads.
My issues are how I get the results and view the results and with what tools do I use. Do I need to learn a programming language such as C#, or use the BI component, SQl, Transact SQL or a combination.
The results I want to see will appear very simple to achieve and no doubt for the majority but for someone just starting its a little daunting.
Most results will be taking a present value from a previous value grouped by a name and show the result eg.
Smith 1/1/2008 25
Smith 7/1/2008 20
Smith 15/1/2008 35
to show
Smith 1/1/2008 25 null
Smith 7/1/2008 20 5
Smith 15/1/2008 35 -15
I would also like to see the previous number on the following row i.e to see 1/1/2008 25 on the 2nd row of Smith 7/1/2008
There are quite a few results I would want to update in addition I would like to profile each name number of instances by various all values relationships etc, Hence why I thought results in Db would help. Is this something BI would achieve automatically for me with a bit of work?
I have kept it simple although my apologies for being long winded, there are obviously a lot more calculations involved but most run along the above lines, date differences etc. I appreciate I need to learn, but what I need to learn is the question.
Any help and advice is greatly appreciated.
Sam
March 7, 2008 at 3:08 am
Hi,
I've written an article 'Linking to the Previous Row' which illustrates exactly this sort of thing using CTEs (Common Table Expressions).
Unfortunately it's not published yet. (It'll be published on Thurs Mar 13 2008 on this site.)
If it's urgent, I can send you a preview copy. If so can you send me a private message.
Regards,
David McKinney.
March 7, 2008 at 8:35 am
David McKinney (3/7/2008)
Hi,I've written an article 'Linking to the Previous Row' which illustrates exactly this sort of thing using CTEs (Common Table Expressions).
Unfortunately it's not published yet. (It'll be published on Thurs Mar 13 2008 on this site.)
If it's urgent, I can send you a preview copy. If so can you send me a private message.
Regards,
David McKinney.
David,
Thanks for the reply, its not urgent as I am just finishing data validation which has taken a long time, and will probably finish completely next week. As I am a little thick I could do with a fairly simple front end which is why I thought something like C# for the GUI, but that leaves the question SQL or LINQ.
The example is simplistic of what I want to achieve but it is that in its basic form all the rest would if you like be just more complex variations on that theme. I was wondering if BI reports offered that kind of capability, to look for patterns in data, provide calculations on data etc.
I suppose its being lazy but do I have to learn SQL, A prgramming Language such as C#, Transact - SQL and BI, or can it be achieved with just 1 of those. Then where do I start learning really, DB or programming.
Once the data is in the changes are likely to be on the reporting side, looking for patterns, running different queries rather than any major changes to the DB structure.
Regards
Sam
March 7, 2008 at 9:20 am
Sam,
As I see it, whatever you do, you're going to have to learn a little SQL. Your data's going to be in a relational database, and you're going to have to get it out and put it on a screen. Even if you're using LINQ, you'll still have to populate your objects from the database.
My advice would be - especially if you're learning - to start with your data model in a relational database (why not SQL Server!!), and use c# for a front end. (What you put between the two - if anything - is your affair.)
As for LINQ, while not my area of expertise, this pretty much assumes you've got a business layer and o-o entities well defined - which you can then query with LINQ. (But you're still going to need sql!)
So LINQ you can live without.
BI is probably unnecessary for what you're doing i.e. standard SQL should suffice.
Just my opinion,
Regards,
David.
March 7, 2008 at 11:20 am
David,
Thanks again, I did realise I needed SQL, I did in fact buy a book, yet unopened as I discovered the data was in a real mess, so as the old saying goes C*** in C*** out. I had thought of using Access as a front end but alas there is an issue with Access 2003 and 2005/2008.
The only reason I thought about BI was reporting as it will be a lot of stats and profilling the ability to drill down would be useful. I am unsure whether it has the capability within it to perform the above calculations and a few more complex ones on "raw" data, a lot date related week, month, year datediff etc.
Having spent a couple of hours looking around I have come to the conclusion that I will build the database first and take a look at reporting to see what that brings, having spent so much time on the data another couple of weeks is neither here nor there.
Can you recommend good tutorial on SQL and BI for real beginners.
Thanks for taking the time to respond, its greatly appreciated.
Regards
Sam
March 10, 2008 at 10:01 am
A couple of hints that will save tears later.
2gb is not small.
If I had 2 gig of data I wouldn't give it to someone who doesn't know SQL!
Normalisation is not just for big dbs.
Normalisation has nothing to do with DB size. Normalise from day 1, even for the smallest db.
Don't think that your situation is different, that the usual rules don't apply. Every situation is different, which is why we have rules.
Good luck.
March 10, 2008 at 10:26 pm
Most results will be taking a present value from a previous value grouped by a name and show the result eg.
Smith 1/1/2008 25
Smith 7/1/2008 20
Smith 15/1/2008 35
to show
Smith 1/1/2008 25 null
Smith 7/1/2008 20 5
Smith 15/1/2008 35 -15
I would also like to see the previous number on the following row i.e to see 1/1/2008 25 on the 2nd row of Smith 7/1/2008
Those are "Running Total" or "Running Balance" calculations... pretty simple and awfully fast when you know how... please refer to the following...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
...and, no, you don't need BO or any BI tool to do this simple thing...
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2008 at 1:00 am
Thanks lads its appreciated.
Strangely enough I do need to do running totals so that is really helpful.
I was thinking about "cheating" by putting columns from other tables into the biggest one, just really for ease of queries, rather than having to join tables all the time, it would make it bigger but from a novice perspective easier. Other than size is there any other drawbacks to this approach, I suppose if you took it to the extreme you would have 1 table with everything in. Nows theres an idea;)
March 11, 2008 at 4:42 am
It's the same mistake that many make... GOOGLE "3rd Normal Form" to find out about database "normalization" and why it's important despite the fact the you might have to join a table or two. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2008 at 7:58 am
Jeff,
Thanks for that I did look it up and probably a dumb question, but joins make it more complex and can take longer, columns make bigger take longer.
I understand the need to keep information updated, but the data once in and the calculations made is a historical view, I will not be altering the data again. I will be running queries reports etc, hence my interest in BI and performing calculations in that. The issue being that until I have the data to browse with sufficient detail, I am unsure what I want to query/calculate. So 75% of it will be what I expect 25% will be investigation if that makes sense.
There will be 3 main tables Name Table, Events Table and Participants, the other tables will be very small and hold data references for the other tables. I had planned to just have Name and a NameId as PK in Name Table, Event with EventID PK and the Participants would have a PK made up of the NameID and EventID to give every individual a PK to that event.
There are a couple of elements in the calculations that will change depending of the results of the calculations as it is a little trial and error to find the values, I hadn't planned on storing these within the main table for that reason and these will be kept in a seperate table. There is another element taht is liable to change which would be stored in one of the main tables but in this instance I would not want to have the original value updated, so it would become messy to hold those values seperately.
I have picked up on a lot of mistakes I have made as I have been validating the data in Access and I am sure I will make more as I move it to SQL. I have bought a book on SQL by O'Reilly "Learning SQL" but due to the state of the data never got around to looking at it. Obviously that will change as soon as start the project. I had planned to just look at some tutorials etc on SQL before I move the data and play around with the example databases.
I am surprised from one of the above posts that SQL doesn't appear to have a Ranking Function as it is something I will be using quite a lot.
Thanks again for your response, I know "newbies" are a PITA
Regards
Sam
March 11, 2008 at 8:10 am
Event's table should have it's own PK and a unique index on the other two columns you spoke of.
SQL Server 2005 does have a ranking function. 2000 does not. Here's a way to make one... dunno if I mentioned it above or not... kind of in a hurry and wanted to make sure you had it...
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
--Jeff Moden
Change is inevitable... Change for the better is not.
March 11, 2008 at 11:25 am
Yes you did thanks Jeff
I was just going to put it in 2008 unless there are any major bugs.
As mentioned I am going to do a couple of Tutorials to gain some idea about indexes etc, part of the reson I have struggled so much with the data was poorly defined / inaccurate Pk's (not mine I hasten to add) so it been a bit of a pain and I am trying not to repeat the same mistakes.
It will be a couple of weeks before I have any meaningful questions of my own DB
Thanks again for taking the time to respond.
March 11, 2008 at 4:38 pm
One of the cool things in the article on Running Totals... take a look at how the test table is built... I use something like it all the time... real easy to to performance testing on a million rows when you know what the million rows has in it. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
March 12, 2008 at 2:03 am
I will do that, strangely I need that on my main table, which has just north of 1.1 Million rows at the minute, I did get a program written for Access but it took forever and I never had a great deal of confidence in it.
March 12, 2008 at 5:26 am
I would not be that quick to discard MS Access as a potential tool.
1. It's much more accessible to a not-yet-technical user, and makes the data much more accessible to the end user. (You, in this case, with filter/sort capabilities on tables and queries.)
2. It's ~2GB limit is something of a myth; simply move your larger tables into two or more Access DBs (back ends) with links from your primary DB. (I have never seen a performance problem with this approach.)
None of which means you will not still have to understand relational tables, joins, SQL, normalization, even perhaps some VBA, but you can get started much quicker.
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply