/* Cluster watching in birthday months in Picat. From John Brignell "Of birthdays and clusters" https://www.numberwatch.co.uk/of_birthdays_and_clusters.htm """ As we gratefully witness the dying fall of yet another silly season, it is interesting to note how often birthdays provide the basis of the most popular fallacies. In August 2001 a classic of the genre appeared when the New Scientist announced that researchers in Scotland had established that anorexics are likely to be have been June babies. The team studied 446 women who had been diagnosed as anorexic and observed that 30% more than average were born in June. As the monthly average of births is about 37, we deduce that the June number must have been 48. At first sight this looks like a significant result (at least by epidemiological standards) since the probability of getting 48 or more in a random month is about 3%. But that is not what they are doing! They are making twelve such selections and then picking the biggest. Application of the theory of the statistics of extremes tells us that the probability of the largest of twelve such selections being 48 or greater is 30%, which is not significant even by epidemiological standards. """ Compare with my R code (with Swedish comments) at http://www.hakank.org/sims/simulering.html "Cluster watching" This model calculates the min, mean, and max value of 446 random months, as well as two probabilities: - p: what is the probability that the max value is more than 30% of the mean value - p2: how many months has a value that is more than 30% of the mean value. (There's about 2% chance that as many as 3 months has such as value) Cf my Gamble model gamble_cluster_watching_birth_months.rkt This program was created by Hakan Kjellerstrand, hakank@gmail.com See also my Picat page: http://www.hakank.org/picat/ */ import ppl_distributions, ppl_utils. import util. % import ordset. main => go. /* var : min val Probabilities (truncated): 28: 0.1512000000000000 29: 0.1384000000000000 27: 0.1302000000000000 30: 0.1258000000000000 ......... 16: 0.0003000000000000 14: 0.0002000000000000 35: 0.0001000000000000 13: 0.0001000000000000 mean = 27.5881 Percentiles: (0.001 17) (0.01 20) (0.025 21) (0.05 23) (0.1 24) (0.25 26) (0.5 28) (0.75 30) (0.84 30) (0.9 31) (0.95 32) (0.975 32) (0.99 33) (0.999 34) (0.9999 34.000099999999293) (0.99999 34.900010000001203) HPD intervals: HPD interval (0.94): 23.00000000000000..32.00000000000000 Histogram (total 10000) 13: 1 (0.000 / 0.000) 14: 2 (0.000 / 0.000) 16: 3 (0.000 / 0.001) 17: 5 (0.001 / 0.001) 18: 8 (0.001 / 0.002) 19: 28 # (0.003 / 0.005) 20: 79 ### (0.008 / 0.013) 21: 130 ##### (0.013 / 0.026) 22: 191 ######## (0.019 / 0.045) 23: 339 ############# (0.034 / 0.079) 24: 550 ###################### (0.055 / 0.134) 25: 772 ############################### (0.077 / 0.211) 26: 1075 ########################################### (0.107 / 0.318) 27: 1302 #################################################### (0.130 / 0.449) 28: 1512 ############################################################ (0.151 / 0.600) 29: 1384 ####################################################### (0.138 / 0.738) 30: 1258 ################################################## (0.126 / 0.864) 31: 824 ################################# (0.082 / 0.946) 32: 384 ############### (0.038 / 0.985) 33: 129 ##### (0.013 / 0.998) 34: 23 # (0.002 / 1.000) 35: 1 (0.000 / 1.000) var : mean val Probabilities: 37.1667: 1.0000000000000000 mean = 37.1667 Percentiles: (0.001 37.166666666666664) (0.01 37.166666666666664) (0.025 37.166666666666664) (0.05 37.166666666666664) (0.1 37.166666666666664) (0.25 37.166666666666664) (0.5 37.166666666666664) (0.75 37.166666666666664) (0.84 37.166666666666664) (0.9 37.166666666666664) (0.95 37.166666666666664) (0.975 37.166666666666664) (0.99 37.166666666666664) (0.999 37.166666666666664) (0.9999 37.166666666666664) (0.99999 37.166666666666664) HPD intervals: HPD interval (0.94): 37.16666666666666..37.16666666666666 Histogram (total 10000) 37.166666666666664: 10000 ############################################################ (1.000 / 1.000) var : max val Probabilities (truncated): 47: 0.1378000000000000 46: 0.1344000000000000 45: 0.1165000000000000 48: 0.1135000000000000 ......... 40: 0.0008000000000000 61: 0.0005000000000000 62: 0.0004000000000000 64: 0.0003000000000000 mean = 47.4309 Percentiles: (0.001 41) (0.01 42) (0.025 42) (0.05 43) (0.1 44) (0.25 45) (0.5 47) (0.75 49) (0.84 51) (0.9 52) (0.95 53) (0.975 55) (0.99 57) (0.999 61) (0.9999 64) (0.99999 64) HPD intervals: HPD interval (0.94): 42.00000000000000..53.00000000000000 Histogram (total 10000) 40: 8 (0.001 / 0.001) 41: 63 ### (0.006 / 0.007) 42: 226 ########## (0.023 / 0.030) 43: 553 ######################## (0.055 / 0.085) 44: 905 ####################################### (0.090 / 0.175) 45: 1165 ################################################### (0.117 / 0.292) 46: 1344 ########################################################### (0.134 / 0.426) 47: 1378 ############################################################ (0.138 / 0.564) 48: 1135 ################################################# (0.114 / 0.678) 49: 925 ######################################## (0.092 / 0.770) 50: 697 ############################## (0.070 / 0.840) 51: 512 ###################### (0.051 / 0.891) 52: 363 ################ (0.036 / 0.927) 53: 273 ############ (0.027 / 0.955) 54: 165 ####### (0.017 / 0.971) 55: 117 ##### (0.012 / 0.983) 56: 65 ### (0.006 / 0.989) 57: 41 ## (0.004 / 0.994) 58: 19 # (0.002 / 0.995) 59: 21 # (0.002 / 0.998) 60: 13 # (0.001 / 0.999) 61: 5 (0.001 / 0.999) 62: 4 (0.000 / 1.000) 64: 3 (0.000 / 1.000) var : p Probabilities: false: 0.6777000000000000 true: 0.3223000000000000 mean = [false = 0.6777,true = 0.3223] Percentiles: (0.001 false) (0.01 false) (0.025 false) (0.05 false) (0.1 false) (0.25 false) (0.5 false) (0.75 true) (0.84 true) (0.9 true) (0.95 true) (0.975 true) (0.99 true) (0.999 true) (0.9999 true) (0.99999 true) HPD intervals: show_hpd_intervals: data is not numeric Histogram (total 10000) false : 6777 ############################################################ (0.678 / 0.678) true : 3223 ############################# (0.322 / 1.000) var : p2 Probabilities: 0: 0.6777000000000000 1: 0.2916000000000000 2: 0.0292000000000000 3: 0.0014000000000000 4: 0.0001000000000000 mean = 0.3546 Percentiles: (0.001 0) (0.01 0) (0.025 0) (0.05 0) (0.1 0) (0.25 0) (0.5 0) (0.75 1) (0.84 1) (0.9 1) (0.95 1) (0.975 2) (0.99 2) (0.999 3) (0.9999 3.0000999999992928) (0.99999 3.9000100000012026) HPD intervals: HPD interval (0.94): 0.00000000000000..1.00000000000000 Histogram (total 10000) 0: 6777 ############################################################ (0.678 / 0.678) 1: 2916 ########################## (0.292 / 0.969) 2: 292 ### (0.029 / 0.999) 3: 14 (0.001 / 1.000) 4: 1 (0.000 / 1.000) var : months with > 30% * mean val Probabilities (truncated): []: 0.6777000000000000 [1]: 0.0279000000000000 [3]: 0.0252000000000000 [2]: 0.0251000000000000 ......... [3,5,12]: 0.0001000000000000 [2,7,11]: 0.0001000000000000 [1,7,8]: 0.0001000000000000 [1,2,5]: 0.0001000000000000 Percentiles: (0.001 []) (0.01 []) (0.025 []) (0.05 []) (0.1 []) (0.25 []) (0.5 []) (0.75 [3]) (0.84 [6]) (0.9 [8]) (0.95 [10,11]) (0.975 [11,12]) (0.99 [12]) (0.999 [12]) (0.9999 [12]) (0.99999 [12]) */ go ?=> reset_store, run_model(10_000,$model,[show_probs_trunc,mean, show_percentiles, show_hpd_intervals,hpd_intervals=[0.94], show_histogram]), nl, % show_store_lengths,nl, % fail, nl. go => true. model() => N = 446, % number of women M = 12, % number of months % For each of the 446 women, generate a random birth month 1..12 BirthMonths = random_integer1_n(M,N), % The number of births of each month Births = [count_occurrences(J,BirthMonths) : J in 1..M], % println(births=Births), MinVal = Births.min, MeanVal = Births.mean, MaxVal = Births.max, % What is the probability that the max value is larger than 30% of the mean value? P = check(MaxVal > (MeanVal * 1.3)), % How many months has values larger than 30% of the mean value? P2 = [ cond(Births[J] > 1.3*MeanVal ,1,0) : J in 1..M].sum, % All months that has > 30% of the mean value GTMonths = [ J : J in 1..M, Births[J] > 1.3*MeanVal], add("min val",MinVal), add("mean val",MeanVal), add("max val",MaxVal), add("p",P), add("p2",P2), add("months with > 30% * mean val",GTMonths).