Probability in SQL

  • Hi

    I am not sure how you would pull this of in SQL but I was hoping someone could help me out.
    I have the below data and would like to calculate:

    When the Status is Re-Activation what the probability is of it going past 0 Completion Days by looking at the number of times it got 'REJE'(Rejected) and 'N-CO' (Not Complete) but also the time it took for the 'REJE' and 'N-CO' to occur.

    Not really my area but it would be nice to learn something new.
    I would like to create an extra column for each identifier telling me the probability of the it going past 0 days.
    Thanks,
    Danii

    CREATE TABLE #temp
    (
    ServiceNumber varchar(20) NOT NULL
    ,Identifier varchar(20)NOT NULL
    ,CreatedDate DATETIME NOT NULL
    ,CompletedDate DATETIME NOT NULL
    ,SN_Type varchar(20) NOT NULL
    ,SN_Status varchar(20) NOT NULL
    );
    INSERT INTO #temp
    VALUES ('318403853','61030203647','20170512','20170518','De-Activated', 'COMP')
    INSERT INTO #temp
    VALUES('318616723','61030203647','20170613','20170613','Re-Activattion', 'N-CO')
    INSERT INTO #temp
    VALUES('318637013','61030203647','20170615','20170615','Re-Activattion', 'REJE')
    INSERT INTO #temp
    VALUES('318639191','61030203647','20170615','20170616','Re-Activattion', 'COMP')
    INSERT INTO #temp
    VALUES('318637791','61030203666','20170615','20170616','Re-Activattion', 'Rejected')
    INSERT INTO #temp
    VALUES('318637792','61030203666','20170614','20170615','Re-Activattion', 'N-CO');
    INSERT INTO #temp
    VALUES ('000318634115','64074558782','20170615','20170615','Re-Activattion','N-CO')
    INSERT INTO #temp
    VALUES('000318636669','64074558782','20170615','20170615','Re-Activattion','COMP')
    INSERT INTO #temp
    VALUES('000318636873','64074558782','20170615','20170614','Re-Activattion','REJE')
    INSERT INTO #temp
    VALUES('000318623572','64074558782','20170614','20170614','Re-Activattion','N-CO')
    INSERT INTO #temp
    VALUES('000318627678','64074558782','20170614','20170614','Re-Activattion','N-CO')
    INSERT INTO #temp
    VALUES('000318614132','64074558782','20170613','20170613','Re-Activattion','N-CO')
    INSERT INTO #temp
    VALUES('00031861999','64074558799','20170613','20170613','Re-Activattion','N-CO')
    INSERT INTO #temp
    VALUES('00031861991','64074558791','20170613','20170613','De-Activated','N-CO')
    INSERT INTO #temp
    VALUES('01031861999','64074558792','20160729','20160805','Re-Activattion','N-CO')
    INSERT INTO #temp
    VALUES('02031861999','64074558792','20160810','20160810','Re-Activattion','REJE')
    INSERT INTO #temp
    VALUES('03031861999','64074558792','20160811','20160811','Re-Activattion','COMP')

    ;

    select identifier
        ,case
        when sum(case when SN_STATUS='COMP' and SN_TYPE = 'Re-Activattion' then 1 else 0 end)>0
        then str(datediff(day
             ,min(case
                when SN_TYPE = 'Re-Activattion'
                then CreatedDate
                else null
               end
               )
             ,max(case
                when (SN_TYPE = 'Re-Activattion'
                and SN_STATUS='COMP'
                  )
                then CompletedDate
                else null
               end
               )
             )
          )
        when sum(case when SN_TYPE='Re-Activattion' then 1 else 0 end)>0
        then 'NOT COMP'
        else 'NO RE-ACT'
       end
       as RE_ACT_COMPLETION_TIME
      
           , sum(case when (SN_STATUS = 'SENT' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN SENT #'
          , sum(case when (SN_STATUS = 'ACCE' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN ACCE #'
          , sum(case when (SN_STATUS = 'N-CO' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN N-CO #'
             , sum(case when (SN_STATUS = 'REJE' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN REJE #'
        
    from #temp
    -- WHERE Identifier = '64074558792'
    group by identifier
    ;

  • Probability is rather dependent on how you choose to perceive it.   On small data samples, the likelihood of that probability being accurate is rather low.   You usually need to measure something called "statistical significance" in order to ensure that your sample size isn't too small to be worth measuring.   If you have data over a rather longer period of time (usually, this would be several years to a decade or more), you might then have some level of predictability.   The problem isn't so much the computation of probability, but understanding the causes of variation in your data.   Human error is so often the cause of variation in data, that regardless of the nature of the data, it can often result in unreliable predictions.   If you were able to have a cause field that would have possible values that include human error, you could at least throw that part of the data away and start looking at the numbers from there.

    As to getting help with computing probability, you're going to need to explain the nature of your data and exactly what it represents before it makes sense to start computing values.

    Steve (aka sgmunson) πŸ™‚ πŸ™‚ πŸ™‚
    Rent Servers for Income (picks and shovels strategy)

  • Jacob Wilkins - Monday, July 17, 2017 7:15 AM

    Also, just as a note, please do not post the same topic multiple times. 

    You've posted several versions of this question under sub-forums for different versions of SQL Server.

    Posting a brand new topic means any information from the older posts is gone, including very helpful things like answers you've given to other members' questions. At a minimum it clutters up the forums.

    On that note, what version of SQL Server are you actually using?

    Cheers!

    The same question or variants thereof has been posted 8 times. In some cases the OP has started a new thread with responders questions left unanswered. Very inefficient, very frustrating for responders.

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work - Monday, July 17, 2017 9:33 AM

    Jacob Wilkins - Monday, July 17, 2017 7:15 AM

    Also, just as a note, please do not post the same topic multiple times. 

    You've posted several versions of this question under sub-forums for different versions of SQL Server.

    Posting a brand new topic means any information from the older posts is gone, including very helpful things like answers you've given to other members' questions. At a minimum it clutters up the forums.

    On that note, what version of SQL Server are you actually using?

    Cheers!

    The same question or variants thereof has been posted 8 times. In some cases the OP has started a new thread with responders questions left unanswered. Very inefficient, very frustrating for responders.

    BWAAAA-HAAA!!!! You're "probably" right there, Chris. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply