/* 

  Cluster watching in birthdays in Picat.

  From 
  John Brignell "Of birthdays and clusters"
  https://www.numberwatch.co.uk/of_birthdays_and_clusters.htm
  """
  At the time, ninety people had died of vCJD. The probability of two of them 
  having the same birthday is 0.999993848, i.e. a certainty. In fact there was 
  an evens chance of three of them having the same birthday. Likewise, if we 
  divided the UK up into 1000 areas of equal population, we would find by the 
  same calculation that the probability of two of the ninety coming from the 
  same area is about 0.98. Yet the original Queniborough two were claimed to be 
  statistically significant and the hunt was begun for more 'linked pairs'.
  
  """
  
  Compare with my R code (with Swedish comments) at
  http://www.hakank.org/sims/simulering.html "Cluster watching"

  In summary (for 1000 areas): 
  - The probability that at least two of 90 persons have the same birthday is
    about 1.
    We would require about 4 persons having the same birthdays to give  
    it some sort of significancy.

  - The probability (p) that at least two of 90 persons live in the same of 1000
    regions is about 0.98.
    We would require about 3 person living in the same region to give it
    some sort of significancy.

  Cf my Gamble model gamble_cluster_watching_birth_days.rkt

  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

import ppl_distributions, ppl_utils.
import util.
% import ordset.

main => go.

/*

 * 90 people having birhday on a certain day (m = 1000, number of regions).

m = 1000
var : mean val
Probabilities:
0.09: 0.9199000000000001
0.089: 0.0760000000000000
0.088: 0.0039000000000000
0.087: 0.0002000000000000
mean = 0.0899156
Percentiles:
(0.001 0.087999999999999995)
(0.01 0.088999999999999996)
(0.025 0.088999999999999996)
(0.05 0.088999999999999996)
(0.1 0.089999999999999997)
(0.25 0.089999999999999997)
(0.5 0.089999999999999997)
(0.75 0.089999999999999997)
(0.84 0.089999999999999997)
(0.9 0.089999999999999997)
(0.95 0.089999999999999997)
(0.975 0.089999999999999997)
(0.99 0.089999999999999997)
(0.999 0.089999999999999997)
(0.9999 0.089999999999999997)
(0.99999 0.089999999999999997)
HPD intervals:
HPD interval (0.94): 0.08900000000000..0.09000000000000
Histogram (total 10000)
  0.087:     2                                                              (0.000 / 0.000)
  0.088:    39                                                              (0.004 / 0.004)
  0.088:     0                                                              (0.000 / 0.004)
  0.089:  9959 ############################################################ (0.996 / 1.000)

var : max val
Probabilities:
2: 0.8806000000000000
3: 0.0983000000000000
1: 0.0188000000000000
4: 0.0023000000000000
mean = 2.0841
Percentiles:
(0.001 1)
(0.01 1)
(0.025 2)
(0.05 2)
(0.1 2)
(0.25 2)
(0.5 2)
(0.75 2)
(0.84 2)
(0.9 3)
(0.95 3)
(0.975 3)
(0.99 3)
(0.999 4)
(0.9999 4)
(0.99999 4)
HPD intervals:
HPD interval (0.94): 2.00000000000000..3.00000000000000
Histogram (total 10000)
      1:   188 #                                                            (0.019 / 0.019)
      2:  8806 ############################################################ (0.881 / 0.899)
      3:   983 #######                                                      (0.098 / 0.998)
      4:    23                                                              (0.002 / 1.000)

var : p
Probabilities:
true: 0.9812000000000000
false: 0.0188000000000000
mean = [true = 0.9812,false = 0.0188]
Percentiles:
(0.001 false)
(0.01 false)
(0.025 true)
(0.05 true)
(0.1 true)
(0.25 true)
(0.5 true)
(0.75 true)
(0.84 true)
(0.9 true)
(0.95 true)
(0.975 true)
(0.99 true)
(0.999 true)
(0.9999 true)
(0.99999 true)
HPD intervals:
show_hpd_intervals: data is not numeric
Histogram (total 10000)
false     :   188 #                                                            (0.019 / 0.019)
true      :  9812 ############################################################ (0.981 / 1.000)

var : p2
Probabilities:
0: 0.7059000000000000
1: 0.2490000000000000
2: 0.0412000000000000
3: 0.0038000000000000
5: 0.0001000000000000
mean = 0.3433
Percentiles:
(0.001 0)
(0.01 0)
(0.025 0)
(0.05 0)
(0.1 0)
(0.25 0)
(0.5 0)
(0.75 1)
(0.84 1)
(0.9 1)
(0.95 1)
(0.975 2)
(0.99 2)
(0.999 3)
(0.9999 3.0001999999985856)
(0.99999 4.8000200000024051)
HPD intervals:
HPD interval (0.94): 0.00000000000000..1.00000000000000
Histogram (total 10000)
      0:  7059 ############################################################ (0.706 / 0.706)
      1:  2490 #####################                                        (0.249 / 0.955)
      2:   412 ####                                                         (0.041 / 0.996)
      3:    38                                                              (0.004 / 1.000)
      5:     1                                                              (0.000 / 1.000)


*/
go ?=>
  M = 1000, % number of regions  
  println(m=M),
  reset_store,
  run_model(10_000,$model(M),[show_probs_trunc,mean,
                              show_percentiles,
                              show_hpd_intervals,hpd_intervals=[0.94],
                              show_histogram]),
  nl,
  % show_store_lengths,nl,
  % fail,
  nl.
go => true.
  
  
model(M) =>

  N = 90, % Number of births
  
  % For each of the n births get a random day/region
  BirthDays = random_integer_n(M,N),
   
  % The number of births of each days / people from a certain region
  Births =  [ [ cond(BirthDays[I] == J,1,0) :  I in 1..N].sum  : J in 1..M],
  %  println(births=Births),
  MeanVal = Births.mean,
  MaxVal  = Births.max,
   
  % What is the probability that there are more than 1 births on one of the days
  % (at last 2 persons coming from the same region)
  P = check(MaxVal > 1),

  % How many days/regions has values larger than 1?
  P2 = [ cond(Births[I] > 1,1,0) : I in 1..N].sum,
  
  add("mean val",MeanVal),
  add("max val",MaxVal),
  add("p",P),
  add("p2",P2).