Probability in SQL

Question

Probability in SQL

Danii_Pink

Right there with Babe

Points: 778
More actions
July 6, 2017 at 7:07 pm

#338198

Hi
I am not sure how you would pull this of in SQL but I was hoping someone could help me out.
I have the below data and would like to calculate:
When the Status is Re-Activation what the probability is of it going past 0 Completion Days by looking at the number of times it got 'REJE'(Rejected) and 'N-CO' (Not Complete) but also the time it took for the 'REJE' and 'N-CO' to occur.
Not really my area but it would be nice to learn something new.
I would like to create an extra column for each identifier telling me the probability of the it going past 0 days.
Thanks,
Danii
CREATE TABLE #temp ( ServiceNumber varchar(20) NOT NULL ,Identifier varchar(20)NOT NULL ,CreatedDate DATETIME NOT NULL ,CompletedDate DATETIME NOT NULL ,SN_Type varchar(20) NOT NULL ,SN_Status varchar(20) NOT NULL ); INSERT INTO #temp VALUES ('318403853','61030203647','20170512','20170518','De-Activated', 'COMP') INSERT INTO #temp VALUES('318616723','61030203647','20170613','20170613','Re-Activattion', 'N-CO') INSERT INTO #temp VALUES('318637013','61030203647','20170615','20170615','Re-Activattion', 'REJE') INSERT INTO #temp VALUES('318639191','61030203647','20170615','20170616','Re-Activattion', 'COMP') INSERT INTO #temp VALUES('318637791','61030203666','20170615','20170616','Re-Activattion', 'Rejected') INSERT INTO #temp VALUES('318637792','61030203666','20170614','20170615','Re-Activattion', 'N-CO'); INSERT INTO #temp VALUES ('000318634115','64074558782','20170615','20170615','Re-Activattion','N-CO') INSERT INTO #temp VALUES('000318636669','64074558782','20170615','20170615','Re-Activattion','COMP') INSERT INTO #temp VALUES('000318636873','64074558782','20170615','20170614','Re-Activattion','REJE') INSERT INTO #temp VALUES('000318623572','64074558782','20170614','20170614','Re-Activattion','N-CO') INSERT INTO #temp VALUES('000318627678','64074558782','20170614','20170614','Re-Activattion','N-CO') INSERT INTO #temp VALUES('000318614132','64074558782','20170613','20170613','Re-Activattion','N-CO') INSERT INTO #temp VALUES('00031861999','64074558799','20170613','20170613','Re-Activattion','N-CO') INSERT INTO #temp VALUES('00031861991','64074558791','20170613','20170613','De-Activated','N-CO') INSERT INTO #temp VALUES('01031861999','64074558792','20160729','20160805','Re-Activattion','N-CO') INSERT INTO #temp VALUES('02031861999','64074558792','20160810','20160810','Re-Activattion','REJE') INSERT INTO #temp VALUES('03031861999','64074558792','20160811','20160811','Re-Activattion','COMP')
;
select identifier ,case when sum(case when SN_STATUS='COMP' and SN_TYPE = 'Re-Activattion' then 1 else 0 end)>0 then str(datediff(day ,min(case when SN_TYPE = 'Re-Activattion' then CreatedDate else null end ) ,max(case when (SN_TYPE = 'Re-Activattion' and SN_STATUS='COMP' ) then CompletedDate else null end ) ) ) when sum(case when SN_TYPE='Re-Activattion' then 1 else 0 end)>0 then 'NOT COMP' else 'NO RE-ACT' end as RE_ACT_COMPLETION_TIME , sum(case when (SN_STATUS = 'SENT' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN SENT #' , sum(case when (SN_STATUS = 'ACCE' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN ACCE #' , sum(case when (SN_STATUS = 'N-CO' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN N-CO #' , sum(case when (SN_STATUS = 'REJE' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN REJE #' from #temp -- WHERE Identifier = '64074558792' group by identifier ;

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110639 More actions · Answer 1

Probability is rather dependent on how you choose to perceive it. On small data samples, the likelihood of that probability being accurate is rather low. You usually need to measure something called "statistical significance" in order to ensure that your sample size isn't too small to be worth measuring. If you have data over a rather longer period of time (usually, this would be several years to a decade or more), you might then have some level of predictability. The problem isn't so much the computation of probability, but understanding the causes of variation in your data. Human error is so often the cause of variation in data, that regardless of the nature of the data, it can often result in unreliable predictions. If you were able to have a cause field that would have possible values that include human error, you could at least throw that part of the data away and start looking at the numbers from there.

As to getting help with computing probability, you're going to need to explain the nature of your data and exactly what it represents before it makes sense to start computing values.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Jacob Wilkins - Monday, July 17, 2017 7:15 AM
Also, just as a note, please do not post the same topic multiple times.
You've posted several versions of this question under sub-forums for different versions of SQL Server.
Posting a brand new topic means any information from the older posts is gone, including very helpful things like answers you've given to other members' questions. At a minimum it clutters up the forums.
On that note, what version of SQL Server are you actually using?
Cheers!

The same question or variants thereof has been posted 8 times. In some cases the OP has started a new thread with responders questions left unanswered. Very inefficient, very frustrating for responders.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1003865 More actions · Answer 3

ChrisM@Work - Monday, July 17, 2017 9:33 AM
Jacob Wilkins - Monday, July 17, 2017 7:15 AM
Also, just as a note, please do not post the same topic multiple times.
You've posted several versions of this question under sub-forums for different versions of SQL Server.
Posting a brand new topic means any information from the older posts is gone, including very helpful things like answers you've given to other members' questions. At a minimum it clutters up the forums.
On that note, what version of SQL Server are you actually using?
Cheers!
The same question or variants thereof has been posted 8 times. In some cases the OP has started a new thread with responders questions left unanswered. Very inefficient, very frustrating for responders.

BWAAAA-HAAA!!!! You're "probably" right there, Chris. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)