March 29, 2006 at 10:45 am
So I was asked to create a seperate thread by RandyRandall to discuss my overall issue.
I'm not exactly sure where to start since I was asked to start this.  ... Here's an overview
... Here's an overview
1) I work at a market research company. Our new phone survey software (custom-built, outsourced) is SQL Based, and yours truly had to program a Stats Software for our clients.
2) Because the main SQL Server is going to be busy with our phone surveys and web surveys, we wanted to have a replicated server for our clients to have zippy data access.
3) Each survey has a new set of tables that gets created. So I had to create an automated transactional replication creation process. (First Nightmare). I can't change that. And the replication automation IS working now anyways
4) After replication was automated, I pointed my software to the replicated server.
5) Since the 'answer' table has 3+ million rows, Even simple queries (on the replicated server) took forever, and in turn, would crash the software than ran on the main sql server. This table has, and uses clustered indexes.
6) My solution is an INSERT TRIGGER on the 'answer' table. Wen the automated replication is started, I will determine all the distinct questions in our survey, in this case, 400. For each of those, I will create a table that will hole the ID of the person, and the answer they gave. So when a new row comes in to the table with 3 million rows, it will determine where to send a copy of the answer
This way, when a client logs in to view his real-time survey data, he will only be polling the tables (questions) he requests, and saving a lot of cpu, and not crashing our software.
RandyRandall suggested maybe there's a bigger design flaw... Not sure where I can improve on this solution. Any help would be great....
Also not sure what other info to give you guys... hehe
March 29, 2006 at 12:19 pm
I'm not the best admin in the world, but can't you just port over commands from the log? If a new table is created, it'll be in the log. If there isn't a direct way through SQL sever to automate this task, I beleive there are 3rd-party tools to acheive this.
March 29, 2006 at 12:44 pm
These would be entirely new tables....
March 29, 2006 at 2:59 pm
Dan,
To re-iterate what I said in my response to your other thread: don't do it. The headaches you will have maintaining all those tables and the trigger just aren't worth it. "Each survey has a new set of tables that gets created" is bad enough, but don't compound your vendor's poor design with some of your own. You should be able to get adequate performance through proper design (e.g. indexing and efficient SQL). 3 million rows might seem like a lot, but many of us here regularly work with tables that are orders of magnitude bigger than that without resorting to complex partitioning schemes.
What do you mean when you say "I had to create an automated transactional replication creation process"? Are you using SQL Server's replication or something home grown? I've not heard of a query running on a replica database crashing the source before. If it's home grown and "I can't change that," then the help we can provide may be limited.
Since your replicated database is yours alone, don't feel constrained by your source database's indexing scheme. Perhaps indexing to optimize insert/update makes sense on the source, but on your reporting database you're looking for query and aggregation performance. (I'm giving your vendor the benefit of the doubt that they did something that deliberately "makes sense" for their app; however, if their model creates new tables for each survey, who knows.)
Getting back to your indexing. Your clustered index should be built around your most frequently used search criteria for queries returning many rows or aggregations, in your case probably the Question column. The lead column(s) in the index should be the columns in your WHERE clause (Question). In the DDL you posted elsewhere, your index led with respondent; using that index to search for questions is like using the phonebook to find all the Freds - you still need to read the whole thing. Even if I gave you a seperate book that told you what page each Fred was on in the phonebook, you might choose not to use it since you still had to read most of the pages anyway.
If you need to create a non-clustered index to support other queries, consider a covering index. For instance, if you are frequently searching for Question to get Response, include both in the index so that SQL can get all the data it needs directly from the index, kinda like a book of first names with phone numbers.
I suggest that you post the DDL for your tables so we can see your structure, and also some representative queries. I have no doubt that we can help you improve your design to the point where your trigger won't be needed.
March 29, 2006 at 6:05 pm
I had created a long reply, and after submitting it disapeared, so I won't go back into details...
k, I just dropped the existing Indexes, and created a clustered index on 'RpsQuestion', which is the Question Code.
Then I executed "SELECT DISTINCT(RPSQUESTION) FROM RESPONSE_1000", and the cost in my execution plan went from 15.9 to 20.6. Still as an index scan, not seek
Not only that, but then I dropped ALL indexes, and then re-ran the query. The cost was the same, but it too WAY less time to come back with the results.
Maybe DISTINCT works differently with indexes?
I will play around a bit more with indexs and different queries and keep you posted
March 29, 2006 at 6:17 pm
ok, I think new indexes were in order.
i ran the same query on both servers, and the new index on the replicated table is now much faster
select * from response_1000 where rpsquestion = 'D2A'
i will take my queries from my software, and see what indexes will benefit it better. Or re-do the queries.....
Anyhow, more info to come. Thanks again!
March 30, 2006 at 1:34 am
If it's a reporting database then index it heavily. It's not uncommon for a reporting database to have more indexes per table than it has columns.
However, don't create indexes for the sake of creating indexes. First evaluate what would be best for the clustered index. Best options (my opinion) are narrow, unchanging, increasing, unique columns (importance in that order). Unchanging meaning it never gets an update, increasing means that, like an identity or insert date, new records are added at the 'end'
Once you've got a good cluster, evaluate your queries for candidates for non-clustered indexes. don't be afraid of creating multi-column non-clusters.
My prefered order for optimising is first to make sure the queries are well written, then make sure that the tables are properly indexed. No amount of indexes will make a bad query run fast.
Good luck.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 30, 2006 at 7:55 am
If you still feel you need 2 databases, I found MSDN's Log Shipping articles...
Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply