David Ziffer (11/4/2010)
Several problems. 1, what is @AsOf is a future date? Your function returns nothing from live or archive table. 2, the select max audit date subquery does not specify "and AuditStatus <> 'D'", so it can return a date that will be excluded by the select statement. The first one is hypothetical, but the second one is a failure type error.
3, why select the max(AuditDate) in the subquery instead of selecting the primary key of the archive table, that seems like a better choice since then the top level select is selecting where PK = known value.
You could also find the archive row with just a left outer self join instead of a correlated subquery. Something like this (haven't tested it) ... I've seen left outer self joins perform much better than correlated subqueries that use aggregated functions. Though they're a bit harder to read at first, once you're familiar with the pattern they're fine.
Brian: Several answers:
1. It doesn't make much sense to specify a future audit date, so I would not sacrifice any performance to generate code to specially handle that circumstance. But in any event this code can't possibly be sensitive to whether the date you specify is in the future, because the data in the database has no idea what time it is when you run the query, and furthermore the query is not sensitive to the current time. If you think carefully about the algorithm (or just try it out in the RAP example) you will find that putting in a future date will guarantee that you get the latest version of everything, keeping in mind that the latest version of a deleted record is "no record". But you'll get the latest version expensively, because you'll be going through the more calculation-laden archive table query rather than the ultra-simple current-table query.
2. Either you're expecting a different result than I am or your analysis is flawed. The subquery correctly returns whatever record is the most recent one (meeting the search criterion, in this case matching the correct key) that precedes or equals the AuditDate. If that record happens to be a "D" record then the main query correctly returns no record (which is what we want, because the record has been deleted). If that record is not a "D" record then it returns the most recent version of the record preceding or equalling the date. It is not our intention to return some record no matter what; if a record has been deleted prior to or at the AsOfDate, then we don't want to see it at all.
3. The left-join logic for extracting the maximum date looks interesting and I must say that I never considered such a thing. However I think you'd need to reverse the sign so that we're looking for the "A" record where there is no GREATER "B" record (I believe your logic as published would find the minimum "A", not the maximum that we want). But I would have to test this carefully to see how the "where" condition interacts with the join. And we'd have to change the test for "D" because your query here would neglect to properly return "no record" if the AsOfDate follows the deletion time of the record. And I would be very suspicious of the claimed performance improvement ... it seems to me that the number of records this join would produce would rise exponentially with the number of archived records matching the search criterion.
Thanks for your inputs. I am always grateful when people take the time to analyze my stuff. It's just that in this case I am unable to agree with your assessments (except possibly that a left-join could be used in place of the subquery). I would be interested in your responses to my responses and/or your clarifications, in case there's something here that I'm just not getting.
"because the data in the database has no idea what time it is when you run the query,"
GetDate()? You could select from your main table where @asOf is null or >= getDate(). To me, showing the main record would seem more likely to match users' expectations. If this function is a general purpose function, I would expect it would end up used in places you aren't anticipating. In our shop we use a history table, where the top row in the history table is the current data (trigger populates all data into the history table, even current data), so we can join against it for any time period, including the future. It avoids the issue of splitting queries between main and archive as well. What if someone wants to input that user Bob Smith will move from the R&D department to Implementation department in June of 2011? How could they join against your user data retrieval function when doing so? It seems like it would return 'no such user bob smith in june of 2011'. For us we have the same query to show user department changes for any time in the past or future. So that is definitely shaping my expectations here.
"If that record happens to be a "D" record then the main query correctly returns no record (which is what we want, because the record has been deleted). "
What if it's been un-deleted since then? The point of using soft-deletes in a status column is that they can be undone later, while physical deletes can't. If you stop reporting on deleted rows you might as well physically delete them and save the disk space and extra query overhead. You could also include a @IncludeDeleted parameter to allow them to be ignored/included as desired. At our shop someone could 'delete' a group. Then realize it was a mistake. Get the group list passing in @IncludeDeleted, and then undelete it.
"And I would be very suspicious of the claimed performance improvement ... it seems to me that the number of records this join would produce would rise exponentially with the number of archived records matching the search criterion."
Run some tests. I have. The only time I like to use correlated subqueries is when doing and "and exists" correlated subquery, because then sql server can exit as soon as it finds a single row. Regular correlated subqueries are expensive. SQL Server is built to handle set-based logic efficiently, and the left outer where b is null algorithm is exactly that kind of set based operation. You're right on the condition I put in though, it should be and B.AuditDate > A.AuditDate where ... and B.UserId is null. That will get the A row where there is no higher date row in that table that is still less than the input date. I use it on large data sets all the time with good results. You should just be able to do
where UserId = @userid
and AuditStatus <> 'D'
and A.AuditDate = (select max(AuditDate) from TBadmUser#
where UserId = A.UserId
and AuditDate <= @AsOf)
and @AsOf is not null
select case when A.AuditStatus when 'D' then null else A.UserId end [UserId],
case when A.AuditStatus when 'D' then null else A.LoginName end [LoginName],
case when A.AuditStatus when 'D' then null else A.Notes end [Notes],
case when A.AuditStatus when 'D' then null else A.AuditDate [AuditDate],
case when A.AuditStatus when 'D' then null else A.AuditUserId [AuditUserId],
case when A.AuditStatus when 'D' then null else A.AuditStatus [AuditStatus]
from TBadmUser# A
left join TBadmUser# B
on A.UserId = B.UserId
and A.UserId = @userid
and B.AuditDate > A.AuditDate
where A.AuditDate <= @AsOf
and B.UserId is null
And then include the actual execution plan, and see which select uses more than 50% of the query time. I'd be interested in your results.