May 4, 2009 at 5:56 pm
Hello everyone.
I have been tasked with getting some summary data out of a system with a lot more records than I normally work with, and I am having trouble figuring out the best way to approach the problem.
Basically the database is storing nomination data and I have been tasked with creating something (a view ideally, but a stored procedure returning a result table would also work I think) to allow people to pull details for active nominations (called "Carparks" by my client) for any given month.
The tricky part is that the active status is hard to determine due to how the data is stored and some of the information the client requires is stored in a different database.
Also each nomination can have multiple versions as the nomination record gets renewed.
I am not sure that I would be able to explain it very well, so I will just include the statements that I have created so far and see if anyone can spot any obvious problems.
The views I am creating are in a sperate database (although on the same server) as the database that stores most of the base tables (NDHSOnlineFull) is dropped and re-created every night and I am not able to create my views in that database, I believe this means that schema binding my views to be able to create indexes is not going to be an option (I have never done this anyway so it may or may not have helped).
First off I create a view to return the base nomination data (including things like the Vehicle State and Model that are stored in different databases!!) and the nomination version (is there a better way to figure this out? It is not stored in the table.):
CREATE VIEW vw_NominationVersion AS
SELECT
NH.RecordNumber,
NH.NominationNo,
NH.AccountCode,
NH.CompanyBatchNo,
NH.NominationDate,
NH.ExpirationDate,
NH.TransactionType,
V.VehicleStateCode,
BH.Ref05 AS ModelCode,
(
SELECT
COUNT(*)
FROM
NDHSOnlineFull.dbo.NominationHistory NH1
WHERE
NH1.NominationNo = NH.NominationNo
AND NH1.RecordNumber <= NH.RecordNumber) AS NomVerNo
FROM
NDHSOnlineFull.dbo.NominationHistory NH
INNER JOIN NDHSOnlineFull.dbo.Vehicle V ON V.VehicleNo = NH.VehicleNo
LEFT OUTER JOIN [Cognos assist].dbo.BatchHistory BH ON BH.AccountCode = NH.AccountCode AND BH.AccountPrimaryTag = NH.AccountPrimaryTag
[/code]
Next I create another view to figure out the start and end date for each version of the nomination record so that I will then be able to use these dates to determine if a nomination was active in a particular month (this data is NOT stored against the nomination records and has to be calculated from finding a transaction record in the transfer table that is entered when the nomination starts!):
[code]
CREATE VIEW vw_NominationVersionDates AS
SELECT
NV.RecordNumber,
NV.NominationNo,
NV.NomVerNo,
T.TransferDate AS NominationVersionStartDate,
CAST(DATEPART(YEAR, T.TransferDate) AS VARCHAR) + CAST(DATEPART(MONTH, T.TransferDate) AS VARCHAR) AS StartYearMonth,
COALESCE(DATEADD (DAY, -1, T2.TransferDate), NV.ExpirationDate) AS NominationVersionEndDate,
CAST(DATEPART(YEAR, COALESCE(DATEADD (DAY, -1, T2.TransferDate), NV.ExpirationDate)) AS VARCHAR)
+ CAST(DATEPART(MONTH, COALESCE(DATEADD (DAY, -1, T2.TransferDate), NV.ExpirationDate)) AS VARCHAR) AS EndYearMonth
FROM
vw_NominationVersion NV
INNER JOIN NDHSOnlineFull.dbo.Transfer T ON T.OrganizationCode = NV.AccountCode AND T.CompanyBatch = NV.CompanyBatchNo
LEFT OUTER JOIN NDHSOnlineFull.dbo.Transfer T2 ON T2.OrganizationCode = NV.AccountCode AND T2.CompanyBatch = (
SELECT
NV2.CompanyBatchNo
FROM
vw_NominationVersion NV2
WHERE
NV2.NominationNo = NV.NominationNo
AND NV2.NomVerNo = NV.NomVerNo + 1)
[/code]
The Transfer table has 18,015 records.
And finally I have a query to actually return the data that the client is interested in:
[code]
DECLARE
@CarparkDate AS INT
BEGIN
SET @CarParkDate = 200801
SELECT
NVD.NominationNo,
NVD.NomVerNo,
NV.AccountCode,
NV.VehicleStateCode,
DATEDIFF(YEAR, NV.NominationDate, @CarParkDate) AS NominationAge,
NV.ModelCode
FROM
vw_NominationVersion NV
INNER JOIN vw_NominationVersionDates NVD ON NVD.RecordNumber = NV.RecordNumber
WHERE
NV.TransactionType != 61 --Don't include cancelled transactions
AND NVD.StartYearMonth <= @CarparkDate
AND NVD.EndYearMonth >= @CarparkDate
END
Now I am not sure if there is a better way for me to be handling the dates?
All of the conversions in the second view are probably slowing things down.
Is there any way for me to get away with having a single view instead of a second view to figure out the start and end dates?
I have tried changing where some of the data is being pulled from, I initially had the joins to get the vehicle state and model in my final query instead of the first view. I don't know if it is better in the view or not but with them in my final query it ran for over 5 hours yesterday and had only returned 196,000 records when I finally decided to stop it from processing (the expected result is somewhere around 1.3 million for the month I was testing).
Would I be better off writing a stored procedure and dumping the data into tempory tables instead of using views?
As I said at the start, I have never had to work with tables this big (or across multiple databases like this) before so I am not sure if I am shooting myself in the foot with the way I am trying to do this.
Any pointers would be very much appreciated.
Thanks.
May 5, 2009 at 7:10 am
Just a guess, but I think if you found a way to eliminate this correlated sub-query
(SELECT
COUNT(*)
FROM
NDHSOnlineFull.dbo.NominationHistory NH1
WHERE
NH1.NominationNo = NH.NominationNo AND
NH1.RecordNumber <= NH.RecordNumber)
in vw_NominationVersion I think you would see a major performance boost.
Then I'd work on eliminating the subquery in the JOIN in vw_NominationVersionDates.
If you read the first article in linked in my signature and post based on those suggestions, I would bet that someone, possibly me, would be willing to take the time to offer more comprehensive solution.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
May 5, 2009 at 3:14 pm
Thanks for the reply Jack, and also thank you for pointing out that link, to be honest I was so busy worrying that I had explained the situation in a way that made sense I had not noticed how ugly my code looked in my final post.
I've added some code blocks to help make the code easier to read.
If needed I can post the SQL to create the base tables in case that would help, but due to the problem being about performance with so much data involved I can't see an easy way to provide enough test data for anyone to be able to use (I can't give out any copies of the real data sorry).
For now I am going to have a look at the suggestions you have made and see if I get some improvements there.
Thanks!
Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply