/* 

  Cluster watching in birthday months in Picat.

  From 
  John Brignell "Of birthdays and clusters"
  https://www.numberwatch.co.uk/of_birthdays_and_clusters.htm
  """
  As we gratefully witness the dying fall of yet another silly season, it is interesting 
  to note how often birthdays provide the basis of the most popular fallacies. In August 2001 
  a classic of the genre appeared when the New Scientist announced that researchers in 
  Scotland had established that anorexics are likely to be have been June babies. The team 
  studied 446 women who had been diagnosed as anorexic and observed that 30% more than average 
  were born in June. As the monthly average of births is about 37, we deduce that the June 
  number must have been 48. At first sight this looks like a significant result (at least 
  by epidemiological standards) since the probability of getting 48 or more in a random 
  month is about 3%. But that is not what they are doing! They are making twelve such 
  selections and then picking the biggest. Application of the theory of the statistics of 
  extremes tells us that the probability of the largest of twelve such selections being 
  48 or greater is 30%, which is not significant even by epidemiological standards.
  """

  Compare with my R code (with Swedish comments) at
  http://www.hakank.org/sims/simulering.html "Cluster watching"

  This model calculates the min, mean, and max value of 446 random months, as well as
  two probabilities:
  - p: what is the probability that the max value is more than 30% of the mean value
  - p2: how many months has a value that is more than 30% of the mean value.
    (There's about 2% chance that as many as 3 months has such as value)


  Cf my Gamble model gamble_cluster_watching_birth_months.rkt

  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

import ppl_distributions, ppl_utils.
import util.
% import ordset.

main => go.

/*

var : min val
Probabilities (truncated):
28: 0.1512000000000000
29: 0.1384000000000000
27: 0.1302000000000000
30: 0.1258000000000000
.........
16: 0.0003000000000000
14: 0.0002000000000000
35: 0.0001000000000000
13: 0.0001000000000000
mean = 27.5881
Percentiles:
(0.001 17)
(0.01 20)
(0.025 21)
(0.05 23)
(0.1 24)
(0.25 26)
(0.5 28)
(0.75 30)
(0.84 30)
(0.9 31)
(0.95 32)
(0.975 32)
(0.99 33)
(0.999 34)
(0.9999 34.000099999999293)
(0.99999 34.900010000001203)
HPD intervals:
HPD interval (0.94): 23.00000000000000..32.00000000000000
Histogram (total 10000)
     13:     1                                                              (0.000 / 0.000)
     14:     2                                                              (0.000 / 0.000)
     16:     3                                                              (0.000 / 0.001)
     17:     5                                                              (0.001 / 0.001)
     18:     8                                                              (0.001 / 0.002)
     19:    28 #                                                            (0.003 / 0.005)
     20:    79 ###                                                          (0.008 / 0.013)
     21:   130 #####                                                        (0.013 / 0.026)
     22:   191 ########                                                     (0.019 / 0.045)
     23:   339 #############                                                (0.034 / 0.079)
     24:   550 ######################                                       (0.055 / 0.134)
     25:   772 ###############################                              (0.077 / 0.211)
     26:  1075 ###########################################                  (0.107 / 0.318)
     27:  1302 ####################################################         (0.130 / 0.449)
     28:  1512 ############################################################ (0.151 / 0.600)
     29:  1384 #######################################################      (0.138 / 0.738)
     30:  1258 ##################################################           (0.126 / 0.864)
     31:   824 #################################                            (0.082 / 0.946)
     32:   384 ###############                                              (0.038 / 0.985)
     33:   129 #####                                                        (0.013 / 0.998)
     34:    23 #                                                            (0.002 / 1.000)
     35:     1                                                              (0.000 / 1.000)

var : mean val
Probabilities:
37.1667: 1.0000000000000000
mean = 37.1667
Percentiles:
(0.001 37.166666666666664)
(0.01 37.166666666666664)
(0.025 37.166666666666664)
(0.05 37.166666666666664)
(0.1 37.166666666666664)
(0.25 37.166666666666664)
(0.5 37.166666666666664)
(0.75 37.166666666666664)
(0.84 37.166666666666664)
(0.9 37.166666666666664)
(0.95 37.166666666666664)
(0.975 37.166666666666664)
(0.99 37.166666666666664)
(0.999 37.166666666666664)
(0.9999 37.166666666666664)
(0.99999 37.166666666666664)
HPD intervals:
HPD interval (0.94): 37.16666666666666..37.16666666666666
Histogram (total 10000)
37.166666666666664: 10000 ############################################################ (1.000 / 1.000)

var : max val
Probabilities (truncated):
47: 0.1378000000000000
46: 0.1344000000000000
45: 0.1165000000000000
48: 0.1135000000000000
.........
40: 0.0008000000000000
61: 0.0005000000000000
62: 0.0004000000000000
64: 0.0003000000000000
mean = 47.4309
Percentiles:
(0.001 41)
(0.01 42)
(0.025 42)
(0.05 43)
(0.1 44)
(0.25 45)
(0.5 47)
(0.75 49)
(0.84 51)
(0.9 52)
(0.95 53)
(0.975 55)
(0.99 57)
(0.999 61)
(0.9999 64)
(0.99999 64)
HPD intervals:
HPD interval (0.94): 42.00000000000000..53.00000000000000
Histogram (total 10000)
     40:     8                                                              (0.001 / 0.001)
     41:    63 ###                                                          (0.006 / 0.007)
     42:   226 ##########                                                   (0.023 / 0.030)
     43:   553 ########################                                     (0.055 / 0.085)
     44:   905 #######################################                      (0.090 / 0.175)
     45:  1165 ###################################################          (0.117 / 0.292)
     46:  1344 ###########################################################  (0.134 / 0.426)
     47:  1378 ############################################################ (0.138 / 0.564)
     48:  1135 #################################################            (0.114 / 0.678)
     49:   925 ########################################                     (0.092 / 0.770)
     50:   697 ##############################                               (0.070 / 0.840)
     51:   512 ######################                                       (0.051 / 0.891)
     52:   363 ################                                             (0.036 / 0.927)
     53:   273 ############                                                 (0.027 / 0.955)
     54:   165 #######                                                      (0.017 / 0.971)
     55:   117 #####                                                        (0.012 / 0.983)
     56:    65 ###                                                          (0.006 / 0.989)
     57:    41 ##                                                           (0.004 / 0.994)
     58:    19 #                                                            (0.002 / 0.995)
     59:    21 #                                                            (0.002 / 0.998)
     60:    13 #                                                            (0.001 / 0.999)
     61:     5                                                              (0.001 / 0.999)
     62:     4                                                              (0.000 / 1.000)
     64:     3                                                              (0.000 / 1.000)

var : p
Probabilities:
false: 0.6777000000000000
true: 0.3223000000000000
mean = [false = 0.6777,true = 0.3223]
Percentiles:
(0.001 false)
(0.01 false)
(0.025 false)
(0.05 false)
(0.1 false)
(0.25 false)
(0.5 false)
(0.75 true)
(0.84 true)
(0.9 true)
(0.95 true)
(0.975 true)
(0.99 true)
(0.999 true)
(0.9999 true)
(0.99999 true)
HPD intervals:
show_hpd_intervals: data is not numeric
Histogram (total 10000)
false     :  6777 ############################################################ (0.678 / 0.678)
true      :  3223 #############################                                (0.322 / 1.000)

var : p2
Probabilities:
0: 0.6777000000000000
1: 0.2916000000000000
2: 0.0292000000000000
3: 0.0014000000000000
4: 0.0001000000000000
mean = 0.3546
Percentiles:
(0.001 0)
(0.01 0)
(0.025 0)
(0.05 0)
(0.1 0)
(0.25 0)
(0.5 0)
(0.75 1)
(0.84 1)
(0.9 1)
(0.95 1)
(0.975 2)
(0.99 2)
(0.999 3)
(0.9999 3.0000999999992928)
(0.99999 3.9000100000012026)
HPD intervals:
HPD interval (0.94): 0.00000000000000..1.00000000000000
Histogram (total 10000)
      0:  6777 ############################################################ (0.678 / 0.678)
      1:  2916 ##########################                                   (0.292 / 0.969)
      2:   292 ###                                                          (0.029 / 0.999)
      3:    14                                                              (0.001 / 1.000)
      4:     1                                                              (0.000 / 1.000)

var : months with > 30% * mean val
Probabilities (truncated):
[]: 0.6777000000000000
[1]: 0.0279000000000000
[3]: 0.0252000000000000
[2]: 0.0251000000000000
.........
[3,5,12]: 0.0001000000000000
[2,7,11]: 0.0001000000000000
[1,7,8]: 0.0001000000000000
[1,2,5]: 0.0001000000000000
Percentiles:
(0.001 [])
(0.01 [])
(0.025 [])
(0.05 [])
(0.1 [])
(0.25 [])
(0.5 [])
(0.75 [3])
(0.84 [6])
(0.9 [8])
(0.95 [10,11])
(0.975 [11,12])
(0.99 [12])
(0.999 [12])
(0.9999 [12])
(0.99999 [12])

*/
go ?=>
  reset_store,
  run_model(10_000,$model,[show_probs_trunc,mean,
                           show_percentiles,
                           show_hpd_intervals,hpd_intervals=[0.94],
                           show_histogram]),

  nl,
  % show_store_lengths,nl,
  % fail,
  nl.
go => true.
  
  
model() =>
  N = 446, % number of women
  M = 12,  % number of months

  % For each of the 446 women, generate a random birth month 1..12
  BirthMonths = random_integer1_n(M,N),

  % The number of births of each month
  Births = [count_occurrences(J,BirthMonths) : J in 1..M],
  % println(births=Births),
  MinVal = Births.min,
  MeanVal = Births.mean,
  MaxVal = Births.max,

  % What is the probability that the max value is larger than 30% of the mean value?
  P = check(MaxVal > (MeanVal * 1.3)),
  
  % How many months has values larger than 30% of the mean value?
  P2 = [ cond(Births[J] > 1.3*MeanVal ,1,0)   : J in 1..M].sum,

  % All months that has > 30% of the mean value
  GTMonths = [ J : J in 1..M, Births[J] > 1.3*MeanVal],
  
  add("min val",MinVal),
  add("mean val",MeanVal),
  add("max val",MaxVal),
  add("p",P),
  add("p2",P2),
  add("months with > 30% * mean val",GTMonths).