Getting back to the original question, it's a extremely rare thing when I'll load external data directly to a final table. With those extremely rare exceptions, I almost always load data into staging tables on a separate database, usually on the same machine. There are many reasons for this. Here are two of the most important to me...
1. The data doesn't actually need to ever be backed up because, not only is the data transitory, but it already exists in the original files, which I compress and archive after I've had my way with them. If I make no changes to the code that is used in the database, there is no need for a backup and the backups are relatively tiny because I'll simply truncate or drop any of the "import" tables, which are frequently built at runtime because it's convenient to do so. They are, in all actuality, a form of temporary table that's totally expendable.
2. There are huge advantages to having ETL processes in a database that is set to the SIMPLE Recovery Model, not the least of which is minimal logging, which usually doubles the performance of the code because it's only doing about half the writes to disk. Having the processes in a separate database allows for that without interfering at all with the point-in-time recovery of the other databases.
Having said "ETL" database on the same instance as where the final resting place of the data will be is a huge advantage in many ways, as well.
1. Using synonyms, it's trivial to read from those final databases if data is needed from them for validation of the new data.
2. Using synonyms, it's also trivial to insert, update, or delete row in those final databases based on the prevalidated data that has been imported, verified, and cleaned in the ETL database.
Some will say "Well! You can't do DRI across databases unless you use a trigger!". True enough about not doing implied DRI but you don't use triggers for this type of stuff. You use stored procedures as part of the validation process. Also, you'd have similar but more complex problems if the ETL data lived on a separate box including but not limited to buying another set of licenses for Windows and SQL Server (to name two), doing system level backups, an extra box to manage security for, and establishing a trust between the two systems that's still secure.
I'll also add that if the box appears to not be able to handle both ETL and the normal traffic of the other databases, the problem is usually with the code and not the fact that you're running both ETL and OLTP on the same box. And if the box IS underpowered for such a task, my humble opinion is that it would be better to beef up the one box with future scalability in mind than to ignore the weakness of the current box and standing up another box.
And, no... I have no documentation on any of that... just years of experience where people have stood up another box instead of dealing with the real problem and that problem is in the code. I've been involved in a whole lot of projects where people thought otherwise and the standing up of an additional box did nothing for performance (the code still sucked) plus they had the extra overhead of going across boxes and their code sucked there, as well.
is pronounced "ree-bar
" and is a "Modenism
" for R
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
"If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
"Change is inevitable... change for the better is not."
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)