/* 

  n heads in a row after k tosses in Picat.

  Inspired by (and extended)
  https://www.reddit.com/r/learnmath/comments/1gi6kuf/whats_the_probability_to_get_10_heads_in_a_row_if/
  """
  What’s the probability to get 10 heads in a row if you flip a coin 500 times? How would you 
  generalize the method? 

  Hi!

  I came across a short where a person tried flipping a coin until they got 10 heads in 
  a row. It took them 486 attempts. I do not care about how real or fake the video is, but 
  it did made me ask a question. How would you calculate the chances of getting 10 heads in 
  a row in 486 flips, and more generally, how to calculate the probability of x events(with 
  a probability of say p, to happen independently) happening y times in a row with n cases?
  """


  The probability of getting a run of 10 heads in 486 flips is 
  Picat> X = probability_of_run_size(486,1/2,10)
  X = 0.20910597145616

  The mean value of tosses needed to get 10 heads in a row is
  Picat> X = expected_tosses_needed_for_n_heads(10)
  X = 2046

  Here I explore a related question:
  * What is the probability of getting n heads in a row after exact k tosses with probability 
    p of getting heads?

  For practical/computational purposes we have to give a some maximum limit, e.g. 
  m = 20*expected value of n tosses in a row.

  Thus, the PDF etc can be used in two cases:
  - The probability of n heads after exactly k tosses in max number of tosses
  - The probability of n heads after exactly k tosses (ignoring - kind of - max number of tosses)

  The PDF is
     prob_n_heads_after_k_in_max_m_tosses_dist_pdf(P,M,N,K)
  p: probability of tossing heads
  m: maximum number of tosses
  n: the number of heads in a row to study
  k: the number of tosses when the n number of heads occurred.


  Let's use a simpler example to explore this: 
  What is the probability of getting 3 heads in a rows?

  First, the (total) probability of getting 3 heads in a row in max 21 tosses is
  about 0.804:
  Picat> X = probability_of_run_size(21,1/2,3)             
  X = 0.804141998291016

  We can ask questions such as:
  What is the probability of getting a row of 3 heads come _after exactly_ 7 tosses (with 
  max 21 tosses)?

  p: probability of tossing heads: 1/2
  m: maximum number of tosses: 21
  n: the number of heads in a row to study: 3
  k: the number of tosses when the n number of heads occurred: 7

  > (prob-n-heads-after-k-in-max-m-tosses-pdf 1/2 21 3 7)
  Picat> X = prob_n_heads_after_k_in_max_m_tosses_dist_pdf(1/2,21,3,7)
  X = 0.0546875

  Picat> [V=prob_n_heads_after_k_in_max_m_tosses_dist_pdf(1/2,21,3,V) : V in 0..22].printf_list
   0  0 
   1  0 
   2  0 
   3 0.125 
   4 0.0625 
   5 0.0625 
   6 0.0625 
   7 0.0546875 
   8 0.05078125 
   9 0.046875 
  10 0.04296875 
  11 0.03955078125 
  12 0.036376953125 
  13 0.033447265625 
  14 0.03076171875 
  15 0.028289794921875 
  16 0.026016235351562 
  17 0.02392578125 
  18 0.022003173828125 
  19 0.020235061645508 
  20 0.018609046936035 
  21 0.01711368560791 
  22 0.195858001708984 

  Note: The last entry (k=22) is the probability of NOT getting 3 heads in a rows using
        max 21 tosses.

  Here is the CDF:

  Picat> [V=prob_n_heads_after_k_in_max_m_tosses_dist_cdf(1/2,21,3,V) : V in 0..22].printf_list
   0  0 
   1  0 
   2  0 
   3 0.125 
   4 0.1875 
   5 0.25 
   6 0.3125 
   7 0.3671875 
   8 0.41796875 
   9 0.46484375 
  10 0.5078125 
  11 0.54736328125 
  12 0.583740234375 
  13 0.6171875 
  14 0.64794921875 
  15 0.676239013671875 
  16 0.702255249023438 
  17 0.726181030273438 
  18 0.748184204101562 
  19 0.76841926574707 
  20 0.787028312683105 
  21 0.804141998291016 
  22  1 


  Some quantiles:
  0.0001: 3
  0.001: 3
  0.01: 3
  0.05: 3
  0.25: 5
  0.3: 6
  0.4: 8
  0.5: 10
  0.75: 19
  0.8: 21
  0.84: 22
  0.9: 22
  0.95: 22
  0.99: 22
  0.999: 22
  0.9999: 22
  0.99999: 22

  Note that everything larger than 21 (max number of tosses), the number 21+1=22 is
  used. 

  For the more general question of "all" probabilities of a specific n, then just 
  make m (much) larger, for example m=20*expectation (n=3) = 20*14 = 280:

  PDF
  (0 0)
  (1 0)
  (2 0)
  (3 0.125)
  (4 0.0625)
  (5 0.0625)
  (6 0.0625)
  (7 0.0546875)
  (8 0.05078125)
  (9 0.046875)
  (10 0.04296875)
  ...
  (78 0.00014442822418039307)
  (79 0.00013282245990704932)
  (80 0.00012214929565099986)
  (81 0.0001123337908248114)
  (82 0.00010330702681353683)
  (83 9.500562306934624e-5)
  (84 8.737129209115876e-5)
  (85 8.035043016460804e-5)
  ...
  276 9.042796130921462e-12)
  (277 8.316147576852854e-12)
  (278 7.647890046255897e-12)
  (279 7.033331433706345e-12)
  (280 6.4681566755237534e-12)
  (281 7.402498107604121e-11)
  sum: 1

  CDF
  (0 0)
  (1 0)
  (2 0)
  (3 0.125)
  (4 0.1875)
  (5 0.25)
  (6 0.3125)
  (7 0.3671875)
  (8 0.41796875)
  (9 0.46484375)
  (10 0.5078125)
  ...
  (78 0.9983470875709834)
  (79 0.9984799100308904)
  (80 0.9986020593265414)
  (81 0.9987143931173663)
  (82 0.9988177001441798)
  (83 0.9989127057672491)
  (84 0.9990000770593404)
  (85 0.9990804274895049)
  ...
  (275 0.9999999998874667)
  (276 0.9999999998965095)
  (277 0.9999999999048257)
  (278 0.9999999999124736)
  (279 0.9999999999195068)
  (280 0.999999999925975)
  (281 1.0)

  Quantile
  0.0001: 3
  0.001: 3
  0.01: 3
  0.05: 3
  0.25: 5
  0.3: 6
  0.4: 8
  0.5: 10
  0.75: 19
  0.8: 21
  0.84: 24
  0.9: 30
  0.95: 38
  0.99: 57
  0.999: 84
  0.9999: 112
  0.99999: 139
  0.999999: 167


  Cf my Gamble model gamble_n_heads_in_a_row_after_k_tosses.rkt

  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

import ppl_distributions, ppl_utils.
import util.
% import ordset.

main => go.


/*
  probability_of_run_size(486,1/2,10) = 0.209106
  [p = 0.5,m = 21,n = 3]
  prob_n_heads_after_k_in_max_m_tosses_list(0.5,21,3)
  [0,0,0,0.125,0.1875,0.25,0.3125,0.3671875,0.417969,0.464844,0.5078125,0.547363,0.58374,0.6171875,0.647949,0.676239,0.702255,0.726181,0.748184,0.768419,0.787028,0.804142]

  PDF:
   0: 0.000000000000000
   1: 0.000000000000000
   2: 0.000000000000000
   3: 0.125000000000000
   4: 0.062500000000000
   5: 0.062500000000000
   6: 0.062500000000000
   7: 0.054687500000000
   8: 0.050781250000000
   9: 0.046875000000000
  10: 0.042968750000000
  11: 0.039550781250000
  12: 0.036376953125000
  13: 0.033447265625000
  14: 0.030761718750000
  15: 0.028289794921875
  16: 0.026016235351562
  17: 0.023925781250000
  18: 0.022003173828125
  19: 0.020235061645508
  20: 0.018609046936035
  21: 0.017113685607910
  22: 0.195858001708984

  CDF:
   0: 0.000000000000000
   1: 0.000000000000000
   2: 0.000000000000000
   3: 0.125000000000000
   4: 0.187500000000000
   5: 0.250000000000000
   6: 0.312500000000000
   7: 0.367187500000000
   8: 0.417968750000000
   9: 0.464843750000000
  10: 0.507812500000000
  11: 0.547363281250000
  12: 0.583740234375000
  13: 0.617187500000000
  14: 0.647949218750000
  15: 0.676239013671875
  16: 0.702255249023438
  17: 0.726181030273438
  18: 0.748184204101562
  19: 0.768419265747070
  20: 0.787028312683105
  21: 0.804141998291016
  22: 1.000000000000000

  Quantile:
  0.000001 = 3
  0.00001 = 3
  0.001 = 3
  0.01 = 3
  0.025 = 3
  0.05 = 3
  0.25 = 5
  0.5 = 10
  0.75 = 19
  0.84 = 22
  0.975 = 22
  0.99 = 22
  0.999 = 22
  0.99999 = 22
  0.999999 = 22

  (prob_n_heads_after_k_in_max_m_tosses_dist_cdf(1/2,486,10,486)) = 0.209106

  mean value of getting 10 heads in a row = 2046

  0.999999 quantile for 3 heads in a row = 167

  M for 20*mean(N)
   1:    2 est:    40
   2:    6 est:   120
   3:   14 est:   280
   4:   30 est:   600
   5:   62 est:  1240
   6:  126 est:  2520
   7:  254 est:  5080
   8:  510 est: 10200
   9: 1022 est: 20440
  10: 2046 est: 40920

  expected_tosses_needed_for_n_successes(10,0.5)
  2046.0

  expected_tosses_needed_for_n_successes(20,0.8)
  428.681


*/
go ?=>

  % 0.20910597145616058
  println("probability_of_run_size(486,1/2,10)"=probability_of_run_size(486,1/2,10)),
  P = 1/2,
  M = 21,
  N = 3,
  println([p=P,m=M,n=N]),
  println($prob_n_heads_after_k_in_max_m_tosses_list(P,M,N)),
  println(prob_n_heads_after_k_in_max_m_tosses_list(P,M,N)),
  nl,
  println("PDF:"),
  foreach(K in 0..M+1) 
    printf("%2d: %0.15f\n",K,prob_n_heads_after_k_in_max_m_tosses_dist_pdf(P,M,N,K))
  end,
  nl,
  println("CDF:"),
  foreach(K in 0..M+1)
    printf("%2d: %0.15f\n",K,prob_n_heads_after_k_in_max_m_tosses_dist_cdf(P,M,N,K))  
  end,
  nl,
  println("Quantile:"),
  foreach(Q in quantile_qs())
    println(Q=prob_n_heads_after_k_in_max_m_tosses_dist_quantile(P,M,N,Q))
  end,
  nl,
  % 0.209106
  println("(prob_n_heads_after_k_in_max_m_tosses_dist_cdf(1/2,486,10,486))"=(prob_n_heads_after_k_in_max_m_tosses_dist_cdf(1/2,486,10,486))),
  nl,

  % 2046
  println("mean value of getting 10 heads in a row"=expected_tosses_needed_for_n_heads(10)),
  nl,

  % 167
  println("0.999999 quantile for 3 heads in a row"=
      prob_n_heads_after_k_in_max_m_tosses_dist_quantile(1/2,(20* expected_tosses_needed_for_n_heads(3)),3,0.999999)),
  nl,
  println("M for 20*mean(N)"),
  foreach(I in 1..10)
    printf("%2d: %4d est: %5d\n",I,expected_tosses_needed_for_n_heads(I),(20* expected_tosses_needed_for_n_heads(I)))
  end,
  nl,
  % 2046.0 (as above)
  println($expected_tosses_needed_for_n_successes(10, 0.5)),
  println(expected_tosses_needed_for_n_successes(10, 0.5)),
  nl,
  % expected_tosses_needed_for_n_successes(20,0.8)
  % 428.681
  println($expected_tosses_needed_for_n_successes(20, 0.8)),
  println(expected_tosses_needed_for_n_successes(20, 0.8)),

  nl.
go => true.


/*
  [n = 10,limit = 486,use_limit = true]
  var : len
  Probabilities (truncated):
  487: 0.7946000000000000
  128: 0.0012000000000000
  155: 0.0011000000000000
  10: 0.0011000000000000
  .........
  33: 0.0001000000000000
  29: 0.0001000000000000
  20: 0.0001000000000000
  17: 0.0001000000000000
  mean = 436.763
  HPD intervals:
  HPD interval (0.84): 374.00000000000000..487.00000000000000
  HPD interval (0.99): 32.00000000000000..487.00000000000000
  HPD interval (0.999999): 10.00000000000000..487.00000000000000

  var : theoretical mean
  Probabilities:
  2046: 1.0000000000000000
  mean = 2046.0

  var : p
  Probabilities:
  false: 0.7946000000000000
  true: 0.2054000000000000
  mean = 0.2054

  var : theoretical p
  Probabilities:
  0.209106: 1.0000000000000000
  mean = 0.209106

  [n = 4,limit = 21,use_limit = true]
  var : len
  Probabilities (truncated):
  22: 0.5085000000000000
  4: 0.0623000000000000
  6: 0.0327000000000000
  5: 0.0323000000000000
  .........
  17: 0.0209000000000000
  20: 0.0204000000000000
  19: 0.0187000000000000
  21: 0.0177000000000000
  mean = 16.6573
  HPD intervals:
  HPD interval (0.84): 8.00000000000000..22.00000000000000
  HPD interval (0.99): 4.00000000000000..22.00000000000000
  HPD interval (0.999999): 4.00000000000000..22.00000000000000

  var : theoretical mean
  Probabilities:
  30: 1.0000000000000000
  mean = 30.0

  var : p
  Probabilities:
  false: 0.5085000000000000
  true: 0.4915000000000000
  mean = 0.4915

  var : theoretical p
  Probabilities:
  0.496924: 1.0000000000000000
  mean = 0.496924

  * Skipping the limit (UseLimit=false) for n=10 gives a wide range of
    extremes. Using 10000 samples gives that it might take as long as 20579 tosses 
    (in theory in can take "forever").
    Note that the PDF, CDF, quantiles above handles this as well, just pick a large
    enough m, e.g. 20*expected value


  [n = 10,limit = 486,use_limit = false]
  var : len
  Probabilities (truncated):
  10: 0.0013000000000000
  350: 0.0012000000000000
  199: 0.0012000000000000
  123: 0.0012000000000000
  .........
  107: 0.0001000000000000
  32: 0.0001000000000000
  22: 0.0001000000000000
  19: 0.0001000000000000
  mean = 1989.78
  HPD intervals:
  HPD interval (0.84): 10.00000000000000..3622.00000000000000
  HPD interval (0.99): 10.00000000000000..8911.00000000000000
  HPD interval (0.999999): 10.00000000000000..17435.00000000000000

  var : theoretical mean
  Probabilities:
  2046: 1.0000000000000000
  mean = 2046.0

  var : p
  Probabilities:
  false: 0.7890000000000000
  true: 0.2110000000000000
  mean = 0.211

  var : theoretical p
  Probabilities:
  0.209106: 1.0000000000000000
  mean = 0.209106

  [n = 3,limit = 21,use_limit = false]
  var : len
  Probabilities (truncated):
  3: 0.1284000000000000
  5: 0.0649000000000000
  6: 0.0638000000000000
  4: 0.0609000000000000
  .........
  77: 0.0001000000000000
  72: 0.0001000000000000
  69: 0.0001000000000000
  60: 0.0001000000000000
  mean = 13.9003
  HPD intervals:
  HPD interval (0.84): 3.00000000000000..24.00000000000000
  HPD interval (0.99): 3.00000000000000..55.00000000000000
  HPD interval (0.999999): 3.00000000000000..123.00000000000000

  var : theoretical mean
  Probabilities:
  14: 1.0000000000000000
  mean = 14.0

  var : p
  Probabilities:
  true: 0.8062000000000000
  false: 0.1938000000000000
  mean = 0.8062

  var : theoretical p
  Probabilities:
  0.804142: 1.0000000000000000
  mean = 0.804142

*/
go2 ?=>
  member([N,Limit,UseLimit],[[10,486,true],
                             [4,21,true],
                             [10,486,false],
                             [3,21,false]]),
  println([n=N,limit=Limit,use_limit=UseLimit]),
  reset_store,
  run_model(10_000,$model2(N,Limit,UseLimit),[show_probs_trunc,mean,
                           % show_percentiles,
                           show_hpd_intervals,hpd_intervals=[0.84,0.99,0.999999]
                           % show_histogram,
                           % min_accepted_samples=1000,show_accepted_samples=true
                          ]),
  nl,
  % show_store_lengths,nl,
  fail,
  nl.

f2(C,H,Prob,N,UseLimit,Limit) = Res =>
  if H == N ; (UseLimit == true, C > Limit) then
    Res = C
  else
    Res = f2(C+1,cond(flip(Prob)==true,H+1,0),Prob,N,UseLimit,Limit)
  end.


model2(N,Limit,UseLimit) =>
  Prob = 1/2,
  
  %  The theoretical probability of getting n heads in atmost limit tosses
  TheoreticalP = probability_of_run_size(Limit,Prob,N),
  
  % The theoretical probability of getting n heads in atmost limit tosses
  TheoreticalMean = expected_tosses_needed_for_n_heads(N),

  % Simulate the number of tosses required to get n heads
  Len = f2(0,0,Prob,N,UseLimit,Limit),
  
  P = check(Len <= Limit),

  add("len",Len),
  add("theoretical mean",TheoreticalMean),
  add("p",P),
  add("theoretical p",TheoreticalP).