/* Cluster watching in birthdays in Picat. From John Brignell "Of birthdays and clusters" https://www.numberwatch.co.uk/of_birthdays_and_clusters.htm """ At the time, ninety people had died of vCJD. The probability of two of them having the same birthday is 0.999993848, i.e. a certainty. In fact there was an evens chance of three of them having the same birthday. Likewise, if we divided the UK up into 1000 areas of equal population, we would find by the same calculation that the probability of two of the ninety coming from the same area is about 0.98. Yet the original Queniborough two were claimed to be statistically significant and the hunt was begun for more 'linked pairs'. """ Compare with my R code (with Swedish comments) at http://www.hakank.org/sims/simulering.html "Cluster watching" In summary (for 1000 areas): - The probability that at least two of 90 persons have the same birthday is about 1. We would require about 4 persons having the same birthdays to give it some sort of significancy. - The probability (p) that at least two of 90 persons live in the same of 1000 regions is about 0.98. We would require about 3 person living in the same region to give it some sort of significancy. Cf my Gamble model gamble_cluster_watching_birth_days.rkt This program was created by Hakan Kjellerstrand, hakank@gmail.com See also my Picat page: http://www.hakank.org/picat/ */ import ppl_distributions, ppl_utils. import util. % import ordset. main => go. /* * 90 people having birhday on a certain day (m = 1000, number of regions). m = 1000 var : mean val Probabilities: 0.09: 0.9199000000000001 0.089: 0.0760000000000000 0.088: 0.0039000000000000 0.087: 0.0002000000000000 mean = 0.0899156 Percentiles: (0.001 0.087999999999999995) (0.01 0.088999999999999996) (0.025 0.088999999999999996) (0.05 0.088999999999999996) (0.1 0.089999999999999997) (0.25 0.089999999999999997) (0.5 0.089999999999999997) (0.75 0.089999999999999997) (0.84 0.089999999999999997) (0.9 0.089999999999999997) (0.95 0.089999999999999997) (0.975 0.089999999999999997) (0.99 0.089999999999999997) (0.999 0.089999999999999997) (0.9999 0.089999999999999997) (0.99999 0.089999999999999997) HPD intervals: HPD interval (0.94): 0.08900000000000..0.09000000000000 Histogram (total 10000) 0.087: 2 (0.000 / 0.000) 0.088: 39 (0.004 / 0.004) 0.088: 0 (0.000 / 0.004) 0.089: 9959 ############################################################ (0.996 / 1.000) var : max val Probabilities: 2: 0.8806000000000000 3: 0.0983000000000000 1: 0.0188000000000000 4: 0.0023000000000000 mean = 2.0841 Percentiles: (0.001 1) (0.01 1) (0.025 2) (0.05 2) (0.1 2) (0.25 2) (0.5 2) (0.75 2) (0.84 2) (0.9 3) (0.95 3) (0.975 3) (0.99 3) (0.999 4) (0.9999 4) (0.99999 4) HPD intervals: HPD interval (0.94): 2.00000000000000..3.00000000000000 Histogram (total 10000) 1: 188 # (0.019 / 0.019) 2: 8806 ############################################################ (0.881 / 0.899) 3: 983 ####### (0.098 / 0.998) 4: 23 (0.002 / 1.000) var : p Probabilities: true: 0.9812000000000000 false: 0.0188000000000000 mean = [true = 0.9812,false = 0.0188] Percentiles: (0.001 false) (0.01 false) (0.025 true) (0.05 true) (0.1 true) (0.25 true) (0.5 true) (0.75 true) (0.84 true) (0.9 true) (0.95 true) (0.975 true) (0.99 true) (0.999 true) (0.9999 true) (0.99999 true) HPD intervals: show_hpd_intervals: data is not numeric Histogram (total 10000) false : 188 # (0.019 / 0.019) true : 9812 ############################################################ (0.981 / 1.000) var : p2 Probabilities: 0: 0.7059000000000000 1: 0.2490000000000000 2: 0.0412000000000000 3: 0.0038000000000000 5: 0.0001000000000000 mean = 0.3433 Percentiles: (0.001 0) (0.01 0) (0.025 0) (0.05 0) (0.1 0) (0.25 0) (0.5 0) (0.75 1) (0.84 1) (0.9 1) (0.95 1) (0.975 2) (0.99 2) (0.999 3) (0.9999 3.0001999999985856) (0.99999 4.8000200000024051) HPD intervals: HPD interval (0.94): 0.00000000000000..1.00000000000000 Histogram (total 10000) 0: 7059 ############################################################ (0.706 / 0.706) 1: 2490 ##################### (0.249 / 0.955) 2: 412 #### (0.041 / 0.996) 3: 38 (0.004 / 1.000) 5: 1 (0.000 / 1.000) */ go ?=> M = 1000, % number of regions println(m=M), reset_store, run_model(10_000,$model(M),[show_probs_trunc,mean, show_percentiles, show_hpd_intervals,hpd_intervals=[0.94], show_histogram]), nl, % show_store_lengths,nl, % fail, nl. go => true. model(M) => N = 90, % Number of births % For each of the n births get a random day/region BirthDays = random_integer_n(M,N), % The number of births of each days / people from a certain region Births = [ [ cond(BirthDays[I] == J,1,0) : I in 1..N].sum : J in 1..M], % println(births=Births), MeanVal = Births.mean, MaxVal = Births.max, % What is the probability that there are more than 1 births on one of the days % (at last 2 persons coming from the same region) P = check(MaxVal > 1), % How many days/regions has values larger than 1? P2 = [ cond(Births[I] > 1,1,0) : I in 1..N].sum, add("mean val",MeanVal), add("max val",MaxVal), add("p",P), add("p2",P2).