Correctly using AVG() for partial table samples

  • I track base station performance in an RF network by storing SSI values in this table:

    TABLE BASE_SSI:
      wayside  BIGINT
      timestamp datetime
      refbase   varchar(12)
      ssi   int

    SSI is read once per hour and stored in this table with about one years' worth of readings. There are over 3000 waysides that are sampled in this way.

    I can extract one days' worth of SSI values for any given day,for any given wayside in the past with this query:

        select wayside,timestamp,refbase,ssi from BASE_SSI
      where wayside=225520220000
      and timestamp > DATEADD(day,-7,getdate())
      and timestamp < DATEADD(day,-6,getdate())
      order by timestamp desc

    which gives me these results:

      wayside       timestamp           refbase   ssi
      225520220000  2018-11-02 00:21:09 423.3.01  33
      225520220000  2018-11-01 22:31:03 423.3.01  32
      225520220000  2018-11-01 20:40:53 423.3.01  32
      225520220000  2018-11-01 18:50:45 423.3.01  31
      225520220000  2018-11-01 17:00:35 423.3.01  33
      225520220000  2018-11-01 15:10:26 423.3.01  34
      225520220000  2018-11-01 13:20:20 423.3.01  38
      225520220000  2018-11-01 11:30:11 423.3.01  37
      225520220000  2018-11-01 09:40:03 423.3.01  35
      225520220000  2018-11-01 07:49:03 423.3.01  35
      225520220000  2018-11-01 05:59:50 423.3.01  34
      225520220000  2018-11-01 04:09:43 423.3.01  34
      225520220000  2018-11-01 02:19:34 423.3.01  34

    What I need is a query that gives me the AVERAGE ssi for this 24-hour period in the past, for all waysides. The results should be:

        wayside       date       refbase   avg_ssi
      225520220000  2018-11-01 423.3.01    34
      225520230000  2018-11-01 423.2.21    21
      225520240000  2018-11-01 423.4.11    18
      225520250000  2018-11-01 423.1.21    55
      225520260000  2018-11-01 422.2.01    62
      225520270000  2018-11-01 452.3.07    33
      225520280000  2018-11-01 425.1.03    25

    I only need integral values for average ssi, as shown.

    I've tried using AVG(ssi) in a SELECT statement, but I am not sure how to apply this to all wayside values in a selected time period. For example, in the original query, I add 'AVG(ssi) as avg_ssi' in the SELECT line but it only yields the 'average' SSI for each single record, which is the same as the SSI value.
    In fact, I've only been able to successfully use AVG() when it is applied to the whole table, and where there are no other columns in the SELECT statement, for example:

       
    select AVG(ssi) as avg_ssi where wayside = 12345

    My pseudo-SQL version of this query would be 'display the 24-hour average SSI of every wayside for a given date'.

  • You will want to use a window funtion for this. If you provide some sample data I could make sure it comes out the way you expect it.  Or you can just take a stab at it yourself.  Lots of good articles our there.

    EDIT: I should specifiy that since you have provided some sample data...I meant could you provided it in a DDL statement.


    SELECT quote FROM brain WHERE original = 1
    0 rows returned

  • Since this one really interested me, I figured I would create some sample data to test with.  If anyone is interested this is my sample set.  Out of the 3 "waysides" each data set have distinct timestamps along with different numbers of values.  To make life a little easier on myself, I kept the SSI within the same range for each wayside.

    SAMPLE SET:

    CREATE TABLE #BASE_SSI
    (
    wayside BIGINT,
    timestamp datetime,
    refbase  varchar(12),
    ssi int
    )

    insert into #BASE_SSI
    values
    (225520220000,'2018-11-02 00:21:09','423.3.01',33),
    (225520220000,'2018-11-01 22:31:03','423.3.01',32),
    (225520220000,'2018-11-01 20:40:53','423.3.01',32),
    (225520220000,'2018-11-01 18:50:45','423.3.01',31),
    (225520220000,'2018-11-01 17:00:35','423.3.01',33),
    (225520220000,'2018-11-01 15:10:26','423.3.01',34),
    (225520220000,'2018-11-01 13:20:20','423.3.01',38),
    (225520220000,'2018-11-01 11:30:11','423.3.01',37),
    (225520220000,'2018-11-01 09:40:03','423.3.01',35),
    (225520220000,'2018-11-01 07:49:03','423.3.01',35),
    (225520220000,'2018-11-01 05:59:50','423.3.01',34),
    (225520220000,'2018-11-01 04:09:43','423.3.01',32),
    (225520220000,'2018-11-01 02:19:34','423.3.01',34),
    --second wayside
    (225520230000,'2018-11-02 00:15:09','423.2.21',29),
    (225520230000,'2018-11-01 23:21:03','423.2.21',22),
    (225520230000,'2018-11-01 22:11:03','423.2.21',23),
    (225520230000,'2018-11-01 19:40:53','423.2.21',22),
    (225520230000,'2018-11-01 17:50:45','423.2.21',21),
    (225520230000,'2018-11-01 16:00:35','423.2.21',23),
    (225520230000,'2018-11-01 14:10:26','423.2.21',24),
    (225520230000,'2018-11-01 12:20:20','423.2.21',28),
    (225520230000,'2018-11-01 10:30:11','423.2.21',27),
    (225520230000,'2018-11-01 08:40:03','423.2.21',25),
    (225520230000,'2018-11-01 06:49:03','423.2.21',25),
    (225520230000,'2018-11-01 04:59:50','423.2.21',24),
    (225520230000,'2018-11-01 03:09:43','423.2.21',26),
    (225520230000,'2018-11-01 01:19:34','423.2.21',24),
    --third wayside
    (225520240000,'2018-11-02 00:07:09','423.4.11',19),
    (225520240000,'2018-11-01 23:17:03','423.4.11',15),
    (225520240000,'2018-11-01 21:11:03','423.4.11',12),
    (225520240000,'2018-11-01 19:11:07','423.4.11',12),
    (225520240000,'2018-11-01 17:42:53','423.4.11',12),
    (225520240000,'2018-11-01 15:50:45','423.4.11',11),
    (225520240000,'2018-11-01 13:55:35','423.4.11',13),
    (225520240000,'2018-11-01 14:12:26','423.4.11',14),
    (225520240000,'2018-11-01 12:19:20','423.4.11',18),
    (225520240000,'2018-11-01 10:31:11','423.4.11',17),
    (225520240000,'2018-11-01 08:42:03','423.4.11',15),
    (225520240000,'2018-11-01 06:45:03','423.4.11',15),
    (225520240000,'2018-11-01 04:56:50','423.4.11',14),
    (225520240000,'2018-11-01 03:03:43','423.4.11',14),
    (225520240000,'2018-11-01 01:12:34','423.4.11',14)

  • Here is my solution, I am completely willing to note that I sort of took the cheater's way out with the sub-query.  If given more time, I might have come up with something elegant but here you go:


    select wayside,refbase,avg_ssi
    from
    (
        select wayside, refbase, AVG(SSI) OVER (PARTITION BY wayside,refbase, CONVERT(DATE,timestamp)) as avg_ssi
        from #BASE_SSI
        where CONVERT(DATE,timestamp) = CONVERT(DATE,'2018-11-01')
    ) x
    group by wayside,refbase,avg_ssi
    order by wayside

  • Thanks for this solution, it worked perfectly! I tried it against the real-world version of this data, which has over 5000 samples for each one of over 3500 waysides. It not only worked great, but the execution time is somewhere around 1 second. 

    Now: the last little problem from the real world:  for some, but not all, waysides an SSI value of 12 is illegal and should be left out of the average calculation.  The approximate WHERE clause to match this, in sort-of-pseudo-SQL would be  'where ssi <> 12 if any of this wayside's SSI values are over 61'

    Some background might clarify this: about half the waysides are of one type, where SSI ranges from 1 to 61.  The other half of the waysides range from 12 to 222.  These latter waysides have a hardware glitch that sometimes reports an erroneous SSI of 12. It would be best, and most accurate, if the value of 12 could be ignored for these wayside types, and the only thing that distinguishes them is that true SSI readings are rarely less than 61. 

    Would applying a restricting condition like this make the query unworkable? In thinking this reply through, I've decided to modify the data recording apparatus to not record any SSI=12 readings for these waysides at all, but the question still stands (and I have a years' worth of historic data).

  • Actually, I've also just decided to purge any SSI=12 reading from the historic data as well - so the query will work as it stands.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply