Need help with query performance

  • Hello everyone.

    I have been tasked with getting some summary data out of a system with a lot more records than I normally work with, and I am having trouble figuring out the best way to approach the problem.

    Basically the database is storing nomination data and I have been tasked with creating something (a view ideally, but a stored procedure returning a result table would also work I think) to allow people to pull details for active nominations (called "Carparks" by my client) for any given month.

    The tricky part is that the active status is hard to determine due to how the data is stored and some of the information the client requires is stored in a different database.

    Also each nomination can have multiple versions as the nomination record gets renewed.

    I am not sure that I would be able to explain it very well, so I will just include the statements that I have created so far and see if anyone can spot any obvious problems.

    The views I am creating are in a sperate database (although on the same server) as the database that stores most of the base tables (NDHSOnlineFull) is dropped and re-created every night and I am not able to create my views in that database, I believe this means that schema binding my views to be able to create indexes is not going to be an option (I have never done this anyway so it may or may not have helped).

    First off I create a view to return the base nomination data (including things like the Vehicle State and Model that are stored in different databases!!) and the nomination version (is there a better way to figure this out? It is not stored in the table.):

    CREATE VIEW vw_NominationVersion AS

    SELECT

    NH.RecordNumber,

    NH.NominationNo,

    NH.AccountCode,

    NH.CompanyBatchNo,

    NH.NominationDate,

    NH.ExpirationDate,

    NH.TransactionType,

    V.VehicleStateCode,

    BH.Ref05 AS ModelCode,

    (

    SELECT

    COUNT(*)

    FROM

    NDHSOnlineFull.dbo.NominationHistory NH1

    WHERE

    NH1.NominationNo = NH.NominationNo

    AND NH1.RecordNumber <= NH.RecordNumber) AS NomVerNo
    FROM
    NDHSOnlineFull.dbo.NominationHistory NH
    INNER JOIN NDHSOnlineFull.dbo.Vehicle V ON V.VehicleNo = NH.VehicleNo
    LEFT OUTER JOIN [Cognos assist].dbo.BatchHistory BH ON BH.AccountCode = NH.AccountCode AND BH.AccountPrimaryTag = NH.AccountPrimaryTag
    [/code]

    Next I create another view to figure out the start and end date for each version of the nomination record so that I will then be able to use these dates to determine if a nomination was active in a particular month (this data is NOT stored against the nomination records and has to be calculated from finding a transaction record in the transfer table that is entered when the nomination starts!):

    [code]
    CREATE VIEW vw_NominationVersionDates AS
    SELECT
    NV.RecordNumber,
    NV.NominationNo,
    NV.NomVerNo,
    T.TransferDate AS NominationVersionStartDate,
    CAST(DATEPART(YEAR, T.TransferDate) AS VARCHAR) + CAST(DATEPART(MONTH, T.TransferDate) AS VARCHAR) AS StartYearMonth,
    COALESCE(DATEADD (DAY, -1, T2.TransferDate), NV.ExpirationDate) AS NominationVersionEndDate,
    CAST(DATEPART(YEAR, COALESCE(DATEADD (DAY, -1, T2.TransferDate), NV.ExpirationDate)) AS VARCHAR)
    + CAST(DATEPART(MONTH, COALESCE(DATEADD (DAY, -1, T2.TransferDate), NV.ExpirationDate)) AS VARCHAR) AS EndYearMonth
    FROM
    vw_NominationVersion NV
    INNER JOIN NDHSOnlineFull.dbo.Transfer T ON T.OrganizationCode = NV.AccountCode AND T.CompanyBatch = NV.CompanyBatchNo
    LEFT OUTER JOIN NDHSOnlineFull.dbo.Transfer T2 ON T2.OrganizationCode = NV.AccountCode AND T2.CompanyBatch = (
    SELECT
    NV2.CompanyBatchNo
    FROM
    vw_NominationVersion NV2
    WHERE
    NV2.NominationNo = NV.NominationNo
    AND NV2.NomVerNo = NV.NomVerNo + 1)
    [/code]

    The Transfer table has 18,015 records.

    And finally I have a query to actually return the data that the client is interested in:

    [code]
    DECLARE
    @CarparkDate AS INT

    BEGIN
    SET @CarParkDate = 200801

    SELECT
    NVD.NominationNo,
    NVD.NomVerNo,
    NV.AccountCode,
    NV.VehicleStateCode,
    DATEDIFF(YEAR, NV.NominationDate, @CarParkDate) AS NominationAge,
    NV.ModelCode
    FROM
    vw_NominationVersion NV
    INNER JOIN vw_NominationVersionDates NVD ON NVD.RecordNumber = NV.RecordNumber
    WHERE
    NV.TransactionType != 61 --Don't include cancelled transactions
    AND NVD.StartYearMonth <= @CarparkDate
    AND NVD.EndYearMonth >= @CarparkDate

    END

    Now I am not sure if there is a better way for me to be handling the dates?

    All of the conversions in the second view are probably slowing things down.

    Is there any way for me to get away with having a single view instead of a second view to figure out the start and end dates?

    I have tried changing where some of the data is being pulled from, I initially had the joins to get the vehicle state and model in my final query instead of the first view. I don't know if it is better in the view or not but with them in my final query it ran for over 5 hours yesterday and had only returned 196,000 records when I finally decided to stop it from processing (the expected result is somewhere around 1.3 million for the month I was testing).

    Would I be better off writing a stored procedure and dumping the data into tempory tables instead of using views?

    As I said at the start, I have never had to work with tables this big (or across multiple databases like this) before so I am not sure if I am shooting myself in the foot with the way I am trying to do this.

    Any pointers would be very much appreciated.

    Thanks.

  • Just a guess, but I think if you found a way to eliminate this correlated sub-query

    (SELECT

    COUNT(*)

    FROM

    NDHSOnlineFull.dbo.NominationHistory NH1

    WHERE

    NH1.NominationNo = NH.NominationNo AND

    NH1.RecordNumber <= NH.RecordNumber)

    in vw_NominationVersion I think you would see a major performance boost.

    Then I'd work on eliminating the subquery in the JOIN in vw_NominationVersionDates.

    If you read the first article in linked in my signature and post based on those suggestions, I would bet that someone, possibly me, would be willing to take the time to offer more comprehensive solution.

  • Thanks for the reply Jack, and also thank you for pointing out that link, to be honest I was so busy worrying that I had explained the situation in a way that made sense I had not noticed how ugly my code looked in my final post.

    I've added some code blocks to help make the code easier to read.

    If needed I can post the SQL to create the base tables in case that would help, but due to the problem being about performance with so much data involved I can't see an easy way to provide enough test data for anyone to be able to use (I can't give out any copies of the real data sorry).

    For now I am going to have a look at the suggestions you have made and see if I get some improvements there.

    Thanks!

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply