Fuzzy Datetime Compare

  • The problem: I have two data sources, Google Analytics and our own internal Application Form. Each source records the exact same goal/event, the conclusion of the application process. Unfortunately they record the datetime at slightly different times. For example Google may record the time as 3:05:52 while our app records it as 3:06:13. I know by looking at the Previous Step field in the Google data that the URL was for Location1 and then can see our app also recorded Location1 so I know the two records are a match.

    The Question: How do I properly and reliably compare two datetimes that are usually within seconds or even milliseconds of each other so I can say Record A from Google is the same as Record Q from our own application?

  • The solution will depend on a few different factors.

    1. Will the Google Analytics date/time always come first?
    2. Could there ever be another entry in the same time frame?

    An example check would be:

    SELECT ...
    FROM ...
    WHERE InternalDateTime >= GoogleDateTime
    AND InternalDateTime < dateadd(minute, 1, GoogleDateTime);

    This may not perform as well as needed - if that is the case then adding a persisted derived column to the Google Analytics table might improve performance.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Your reply prompted my thoughts, would it also work in the where clause like the sample below? This assumes I have manually checked several examples and found the average difference is about a minute either way. Also, there could be multiple entries within seconds of one another but the probability of them being applications for the same location is slim. Even if it were to line up location and time the error is unavoidable and therefore acceptable.

    Select ...
    FROM ...
    WHERE InternalDateTime between DATEADD(seconds, -60, GoogleDateTime) and DATEADD(seconds, 60, GoogleDatetime);

    • This reply was modified 3 years, 11 months ago by  crannow.
  • That would probably work - you need to test it to be sure though.  I prefer not using BETWEEN for date/times - but in this case it might be the better option.

    That is why the question about which one will be first...if the GoogleDateTime will always be before the InternalDateTime then it simplifies the check to what I previously posted.

    My guess is that GoogleDateTime will always be before (or equal) to the InternalDateTime - due to the order of processing where Google Analytics is updated/inserted prior to your internal repository.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply