Issues Getting Max(Status) by ID and Max(StatusDate)

  • I'm having one of those moments where I'm sure I'm just over thinking the problem, but none the less I have hit a wall.

    We have a view that lists all ItemID's, and says if they are in one status or another. In truth there are around 10 statuses that the item could be in, but we only care about if it is currently Status = 'Allowed' or not. There are additional statues like 'Not Allowed' that may remove the item from the 'Allowed' status even after it has been in that status before, but there are also additional statues such as 'Marked For Removal' or 'Removal Complete'.

    Our items table presents no issues. The ItemsStatusLog table also stores all the data without an issue. Getting the current status however is what is getting me.

    The way the view was written by the last DBA was via a union statement such as:

    Create View CurrentItemStatus

    As

    Select ItemID

    ,'Allowed' As CurrentStatus

    From dbo.Items

    Where ItemID In

    (Select ItemID

    From dbo.ItemStatusLog

    Where Status = 'Allowed')

    Union

    Select ItemID

    ,'Not Allowed' As CurrentStatus

    From dbo.Items

    Where ItemID Not In

    (Select ItemID

    From dbo.ItemStatusLog

    Where Status = 'Allowed')

    So far no one has really seen the logic bomb, but when I brought it up they said it needs to be changed to reflect the most recent status. And that is where I'm getting stuck. I can get all ItemIDs, and select the Max(StatusDate), but there are issues with just using that:

    - Multiple statuses can all share the exact same StatusDate to the second. This includes being marked 'Allowed' and 'Not Allowed' in the same time frame.

    - If status is marked 'Allowed', then later marked 'Not Allowed', but the Max(StatusDate) status is set to 'Checking On Item', then there is no real clear value for my view.

    I am thinking of presenting the idea that whatever the Max(Row_Number()) is for the same date statuses should be the one we go with to eliminate the multiple values on the same day, but I am open to other suggestions (since I know that the Row_Number() value means next to nothing due to the backend SQL engine functioning). Optimally I need this to get the ID, and then find the Max(StatusDate) where Status In ('Allowed', 'Not Allowed'), and set the value in my view accordingly. I am flexible on this being a view vs. scheduling a job to run twice a day as these values usually don't change that often. I won't bother with including all the various attempts that I've made as none are correct, and I don't want to confuse the issue any further than I already have.

    The closest that I've come to it is something like update all ItemID's that are included in a sub-select of Status = 'Allowed', then update those back to 'Not Allowed' where StatusDate > the 'Allowed' status date. Also update to 'Not Allowed' where ItemID in sub-select of Status = 'Not Allowed'...uggg, I can't even really explain my logic any more without hitting a logic circle that I can't find my way out of.

    Anyone have any ideas?

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • It would help us if you could post the DDL for the table(s) involved, sample data for the tables, and expected results based on the sample data.

  • Sorry for the delay, I was working on another project yesterday.

    I've attached all the live data (I've removed all the items that can't be released to the public). This is in the form of a SQL script to build and populate the 3 tables, and the current way the view is created.

    I was trying to keep the question as simple as I could in my original post. Even though the rmPhotos table does not impact my question one way or the other I figured I'd err on adding too much information. On the rmPhotos table, the field photoSourceID is a foreign key to rvlProperties.rvlPropertyID. Aside from that this should be fairly straight forward on the naming conventions.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • I haven't looked at your attached file yet, but unless you're storing the datetime value for when the status was updated in something other than an actual datetime field, down to the second can be the same, but the actual values for datetime fields can store considerably greater accuracy, so perhaps the real issue is that the value supplied to populate the field is only accurate to the second? If that's the case, no amount of querying is going to give you an accurate answer, so you're going to have to either get a more accurate time source to populate that field (e.g. using GETDATE() as part of the INSERT or UPDATE query), or application logic is somehow going to have to know the difference because of other data that might be handy (but probably isn't). Unfortunately, that won't solve the problem until ALL the status records get at least one update of some kind to ensure the datetime value is sufficiently accurate for each item.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Great point, and it is one that I've brought up to the team. They say we'll look at that when we get the budget, but that may be months off.

    For the time being they seem to like the idea of using a Max(Row_Number()) approach. There are very few records this actually will impact, so they are willing to take the hit on accuracy. The worst that will happen is someone shows up to a property where access has been revoked, and will discuss it with the owner again. If they don't get new permissions then they'll update the system with the new record, and that one will be taken care of.

    Even with that, I keep chasing the logic circle around. I'll be working on it for most of the rest of the afternoon, so if I find the answer then I'll be sure to post it.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • Budget ? Seriously? Somebody must not be thinking. Whatever application is populating the database needs what is likely a 5 second change and then either a redeploy or a recompile and redeploy, and if the whole process were to take more than an hour, I'd be floored. Even a regression test shouldn't take all that long, and having to move something through DEV. TEST, QA, UAT, and PRODUCTION might take a tad longer, but still... this is a seriously simple fix to any query that populates a datetime field.

    Alternatively (and now that I think about it), you might be able to add a trigger and do an insert into a new table that would tie the primary key from the original table to a datetime field whose DEFAULT value is GETDATE(). That way, the TRIGGER only has to insert the primary key from the other table, and it keeps the overall cost of the trigger on iinserts and updates fairly small. You'd also have to trigger on DELETE to remove the appropriate records from the new table as well. In the overall scheme of things, might be an easy fix. It doesn't do much for the existing records, but if the volume of those is small, have somebody decide what the status is for each one affected, and manually update this new table to reflect the decisions made.

    Your thoughts?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I agree, and I like the idea of at least setting up the trigger. The issue that we run into is the application is built for a federal agency. All the hours have to be recorded within 6 minute intervals. Getting the government to approve the spending is a slow process, and because of how our contract is written my company can't do work without explicitly billing.

    The good news is I believe I've solved my issue. I have the logic worked out in my head, but I haven't tried it yet. It is ugly, but here's the concept:

    Select rvlPropertyID

    Max(StatusDate) As MaxDate

    Into #TempDataHolder

    From dbo.rvlPropertyAccess

    Select TDH.rvlPropertyID

    ,TDH.MaxDate

    ,rPA.Status

    ,Row_Number() As RowNum

    Into #TempDataHolder2

    From #TempDataHolder TDH Inner Join dbo.rvlPropertyAccess rPA

    On TDH.rvlPropertyID = rPA.rvlPropertyID

    And TDH.MaxDate = rPA.StatusDate

    Group By TDH.rvlPropertyID

    ,TDH.MaxDate

    ,rPA.Status

    With CTE

    (ID,RowNum)

    As

    (Select rvlPropertyID

    ,Max(RowNum)

    From #TempDataHolder2)

    Select TDH2.rvlPropertyID

    ,TDH2.StatusDate

    ,TDH2.Status

    From #TempDataHolder2 TDH2 Inner Join CTE

    On TDH2.rvlPropertyID = CTE.rvlPropertyID

    And TDH2.RowNum = CTE.RowNum

    That's the logic anyway. I'll see how it goes, and post the final results.

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • It's only ugly to the extent that it might not produce consistent results for the records affected by lack of date/time accuracy, and to the extent that performance goes bad (for whatever reason). In other words, it might say that the status of an affected record is one thing today, and another thing toomorrow, without the status actually changing. If they can live with that, then go for it. Otherwise, plan on the trigger and a requirement to make the decision on the affected rows.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • As promised I'm posting results. After getting the advice from sgmunson I revisited the original table instead of looking at the Excel version (which is where that whole mess of lack of accurate time lived...sorry about that), and the good news is for the view they had accurate DateTime values. There were no duplicates when reviewing the Max(Date) as I had originally feared. Here is the live code (because I was able to work it into a view still instead of a sproc fired from a job twice a day):

    If Object_ID('dbo.GISPropertyAccessFlex') > 0

    Drop View dbo.GISPropertyAccessFlex;

    Go

    ----------

    Create View dbo.GISPropertyAccessFlex

    As

    Select rProp.rvlPropertyID

    ,rProp.rmIncidentID

    ,rProp.propertyName

    ,rProp.rmVvlPropertyType

    ,Rtrim(Ltrim(IsNull

    (Case Rtrim(Ltrim(StreetNumber))

    When ''

    Then Null

    Else Rtrim(Ltrim(StreetNumber))

    + ' '

    End,'')

    + IsNull

    (Case Rtrim(Ltrim(StreetPrefix))

    When ''

    Then Null

    When ' '

    Then Null

    Else Rtrim(Ltrim(StreetPrefix))

    + ' '

    End,'')

    + IsNull

    (Case StreetName

    When ''

    Then Null

    Else Rtrim(Ltrim(StreetName))

    + ' '

    End,PropertyDesc)

    + IsNull

    (Case StreetSuffix

    When ''

    Then Null

    Else StreetSuffix

    + ' '

    End,'')

    + IsNull

    (Case UnitNUmber

    When ''

    Then Null

    Else UnitNUmber

    + ' '

    End,''))) As Address

    ,rProp.cityName

    ,rProp.state

    ,rProp.county

    ,rProp.zipCode

    ,rProp.latitude

    ,rProp.longitude

    ,cP.PhotoCounter

    ,rProp.taxID

    ,Case

    When PropNoRes.MaxDate = '1/1/1899'

    Then Null

    Else PropNoRes.MaxDate

    End As LastAttemptDate

    ,PropRes.rmVvlAccessAttemptResult As CurrentStatus

    ,GEOMETRY::STGeomFromText('Point('

    + Cast(Longitude As Varchar(15))

    + ' '

    + Cast(Latitude As Varchar(15))

    + ')',4326) As geoData

    From

    (Select rvlPropertyID

    ,Max(IsNull(attemptDate,'1/1/1899')) As MaxDate

    From dbo.rvlPropertyAccess

    Group By rvlPropertyID) PropNoRes

    Inner Join

    (Select rvlPropertyID

    ,Max(IsNull(attemptDate,'1/1/1899')) As MaxDate

    ,rmVvlAccessAttemptResult

    From dbo.rvlPropertyAccess

    Group By rvlPropertyID

    ,rmVvlAccessAttemptResult) PropRes

    On PropNoRes.rvlPropertyID = PropRes.rvlPropertyID

    And PropNoRes.MaxDate = PropRes.MaxDate

    Right Join dbo.rvlProperties rProp

    On rProp.rvlPropertyID = PropNoRes.rvlPropertyID

    Join

    (Select rvlProperties.rvlPropertyID

    ,Count(rmPhotos.photoName) As PhotoCounter

    From dbo.rmPhotos Right Outer Join dbo.rvlProperties

    On rmPhotos.photoSourceID = rvlProperties.rvlPropertyID

    Group By rvlProperties.rvlPropertyID) cP

    On rProp.rvlPropertyID = cP.rvlPropertyID

    Where rProp.deleted = 0

    Or rProp.deleted = ''

    Or rProp.deleted Is Null;

    Go

    ----------------------------------------

    Select *

    From dbo.GISPropertyAccessFlex;

    Go

    Then we have the less than glorious news that the other view I was working on (with similar issues) had several thousand records where AssessmentDate Is Null. So I used my hack version as well, and in case someone ever wants to see how to do it (because it is a little more involved) here you go:

    If Object_ID('dbo.GISPropertyAssessmentFlex') > 0

    Drop View dbo.GISPropertyAssessmentFlex;

    Go

    ----------

    Create View dbo.GISPropertyAssessmentFlex

    As

    Select rProp.rvlPropertyID

    ,rProp.rmIncidentID

    ,rProp.propertyName

    ,rProp.rmVvlPropertyType

    ,Rtrim(Ltrim(IsNull

    (Case Rtrim(Ltrim(StreetNumber))

    When ''

    Then Null

    Else Rtrim(Ltrim(StreetNumber))

    + ' '

    End,'')

    + IsNull

    (Case Rtrim(Ltrim(StreetPrefix))

    When ''

    Then Null

    When ' '

    Then Null

    Else Rtrim(Ltrim(StreetPrefix))

    + ' '

    End,'')

    + IsNull

    (Case StreetName

    When ''

    Then Null

    Else Rtrim(Ltrim(StreetName))

    + ' '

    End,PropertyDesc)

    + IsNull

    (Case StreetSuffix

    When ''

    Then Null

    Else StreetSuffix

    + ' '

    End,'')

    + IsNull

    (Case UnitNUmber

    When ''

    Then Null

    Else UnitNUmber

    + ' '

    End,''))) As Address

    ,rProp.cityName

    ,rProp.state

    ,rProp.county

    ,rProp.zipCode

    ,rProp.latitude

    ,rProp.longitude

    ,cP.PhotoCounter

    ,rProp.taxID

    ,Group2.AssessmentDate As LastAssessmentDate

    ,Group2.rmVvlStatus As CurrentStatus

    ,GEOMETRY::STGeomFromText('Point('

    + Cast(Longitude As Varchar(15))

    + ' '

    + Cast(Latitude As Varchar(15))

    + ')',4326) As geoData

    From

    (Select rvlPropertyID

    ,Max(RowNum) As MaxRow

    From

    (Select rvlPropertyID

    ,AssessmentDate

    ,rmVvlStatus

    ,Row_Number() Over(Partition By rvlPropertyID Order By AssessmentDate) As RowNum

    From

    (Select rPA1.rvlPropertyID As rvlPropertyID

    ,rPA1.MaxDate As AssessmentDate

    ,rPA2.rmVvlStatus As rmVvlStatus

    From

    (Select rvlPropertyID

    ,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate

    From dbo.rvlPropAssessments

    Group By rvlPropertyID) rPA1

    Inner Join

    (Select rvlPropertyID

    ,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate

    ,rmVvlStatus

    From dbo.rvlPropAssessments

    Group By rvlPropertyID

    ,rmVvlStatus) rPA2

    On rPA1.rvlPropertyID = rPA2.rvlPropertyID

    And rPA1.MaxDate = rPA2.MaxDate) rPA3) rPA4

    Group By rvlPropertyID) Group1

    Inner Join

    (Select rvlPropertyID

    ,AssessmentDate

    ,rmVvlStatus

    ,Row_Number() Over(Partition By rvlPropertyID Order By AssessmentDate) As RowNum

    From

    (Select rPA1.rvlPropertyID As rvlPropertyID

    ,rPA1.MaxDate As AssessmentDate

    ,rPA2.rmVvlStatus As rmVvlStatus

    From

    (Select rvlPropertyID

    ,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate

    From dbo.rvlPropAssessments

    Group By rvlPropertyID) rPA1

    Inner Join

    (Select rvlPropertyID

    ,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate

    ,rmVvlStatus

    From dbo.rvlPropAssessments

    Group By rvlPropertyID

    ,rmVvlStatus) rPA2

    On rPA1.rvlPropertyID = rPA2.rvlPropertyID

    And rPA1.MaxDate = rPA2.MaxDate) rPA3) As Group2

    On Group1.rvlPropertyID = Group2.rvlPropertyID

    And Group1.MaxRow = Group2.RowNum

    Right Join dbo.rvlProperties rProp

    On rProp.rvlPropertyID = Group1.rvlPropertyID

    Join

    (Select rvlProperties.rvlPropertyID

    ,Count(rmPhotos.photoName) As PhotoCounter

    From dbo.rmPhotos Right Outer Join dbo.rvlProperties

    On rmPhotos.photoSourceID = rvlProperties.rvlPropertyID

    Group By rvlProperties.rvlPropertyID) cP

    On rProp.rvlPropertyID = cP.rvlPropertyID

    Where rProp.deleted = 0

    Or rProp.deleted = ''

    Or rProp.deleted Is Null;

    Go

    ----------------------------------------

    Select *

    From dbo.GISPropertyAssessmentFlex

    And finally if any one is curious these scripts run as follows on my Alienware M18X, Intel Core i7-3610QM @2.30 GHz, 8 GB Ram, Windows 7 Ultimate 64-bit:

    GISPropertyAccessFlex - 46,763 records in 1326 Milliseconds duration

    GISPropertyAssessmentFlex - 46,763 records in 1586 Milliseconds duration

    [font="Arial"]“Any fool can know. The point is to understand.”
    - Albert Einstein

    "DOH!"
    - Homer Simpson[/font]

  • Glad to see this is finally sorted... and with good results, too...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply