Thanks for all the replies 🙂
@scottpletcher: D'oh! Sometimes you just don't see what's obvious.
The reason is that SQL does not short-circuit the OR, and has to evaluate the "@ProjectID IS NULL" against every record in the index/table that it has chosen.
Most of the columns will be NOT NULL, would that improve performance?
When posting to a forum, it's always a tradeoff between brevity vs. not enough context.
So here's more context:
Right now we have a really clunky (IMO) flattened operational table. A record gets written each time an end user runs a particular program. They are run on an adhoc, manual basis. Over the course of say 4 years the table has about 1200 rows.
Right now the business process is they create variables (text) in the programming interface (SAS Enterprise Guide) and submit the program. Thus there is a lot of junk in that table due to bad variable text.
The business model is:
Source: The source of (health) data. The sources are Admitted Patients, Emergency Department, Deaths, and Cause of Death. So only 4 rows.
Project: A project authorized for a given source. A project is generally a request for data from researchers that has been approved by an ethics committee. So, a project's parent is source.
Service: A particular service against a project. That could be repeated requests for data (say a data refresh), or a new service (completely different request but still pertaining to that project). So, a service's parent is project.
Batch: This is a particular run of the service. Multiple batches (rows) usually mean a repeated run of the code, say due to programming errors, etc. This table also captures metrics, such as run dates and row counts. So, a batch's parent is service.
The structure is a bit like a file system:
In this example, the Batch data is analogous to a fact table. IOW the number of rows in the Batch table should be the maimum number of rows in the operational table (view), and the upstream tables, once remodeled, will have unique rows per service/project/source.
I'm working on 1) normalizing the tables into 4 separate tables, and 2) creating a crude interface that will make it easier for the end user to keep from shooting him/herself in the foot with bad data settings, and easier for us to clean up data when they do. Identity columns and FK constraints will enforce cross tabular data integrity, and a view will flatten out the data.
Say the SP is named spGetProject. These are scenarios that I envision:
Return all rows in the table. The user then scans the output for an obvious match on an existing project.
exec spGetProject @ProjectID=300:
Returns all rows matching that ProjectID
exec spGetProject @ProjectID=300, @ServiceID=100:
Returns all rows matching that ProjectID and ServiceID
exec spGetProject @ProjectName='Some Project', @ServiceCode='ABC123':
You get the idea
,<any other required columns>
After scanning for an existing project and/or service,
the end user determines that he/she needs to create a new Project+Service.
There will be separate (cloned) SP's in separate databases that will set the correct SourceID.
In fact the SourceID could just be a computed column in the appropriate database
or coded as a constant in the view.
The business logic is you *always* search for a matching Project+Service. If required, you create the new Project+Service via a SP that implements some error checking, and the newly created row (ProjectSK + ServiceSK) is your setting for this new run.
The interface is crude - I'd prefer to create a .Net form that would provide a picklist interface, but my boss looked at me like I had three heads. Sigh...
By breaking up the tables, they will be easier to keep clean since the data isn't repeated across rows (i.e. ProjectName and ProjectDesc repeated multiple times). If an error occurs, I can go in and fix the FKs in a downstream table then remove the bogus upstream row. Plus unique indexes will help with preventing double loading of the same data.
I do believe the table will stay very small, and a scan will give sufficient performance. I doubt my end users would know how to do an injection attack, let alone try one. Plus they can delete the rows in this table anyway (I don't control the security - IMO they should be allowed to insert rows but not delete them, but that would completely confuse my DBA).
Hope the context helps - I was probably too brief in my first post and too wordy in this one ;-).