/* Number of records for permutations and normal distribution in Picat. How many records are there in a sequence of n elements? This calculates the probabilities of the number of records for n values. The theoretical values is quite good for permutations and continuous distributions. For discrete distributions, e.g. uniform discrete, binomial, etc, this does not apply. (I'm still trying to find out theories for these distributions.) The mean is the harmonic_number(N). Cf my Gamble model gamble_record_number_of_records.rkt This program was created by Hakan Kjellerstrand, hakank@gmail.com See also my Picat page: http://www.hakank.org/picat/ */ import ppl_distributions, ppl_utils. import util. % import ordset. main => go. /* n = 10 PDF: 1 0.1 2 0.282896825396825 3 0.323164682539683 4 0.199426807760141 5 0.07421875 6 0.017436342592593 7 0.002604166666667 8 0.000239748677249 9 0.000012400793651 10 0.000000275573192 Sorted by prob: 3 0.323164682539683 2 0.282896825396825 4 0.199426807760141 1 0.1 5 0.07421875 6 0.017436342592593 7 0.002604166666667 8 0.000239748677249 9 0.000012400793651 10 0.000000275573192 CDF: 1 0.1 2 0.382896825396825 3 0.706061507936508 4 0.905488315696649 5 0.979707065696649 6 0.997143408289242 7 0.999747574955908 8 0.999987323633157 9 0.999999724426808 10 1.0 Quantile: 0.000001 1 0.00001 1 0.001 1 0.01 1 0.025 1 0.05 1 0.1 1 0.25 2 0.5 3 0.75 4 0.84 4 0.9 4 0.975 5 0.99 6 0.999 7 0.99999 9 0.999999 9 Mean: = 2.92897 Mean (h_n): = 2.92897 */ go ?=> N = 10, % 10 value in the list println(n=10), println("PDF:"), [V=num_records_dist_pdf(N,V) : V in 1..N].printf_list, nl, println("Sorted by prob:"), [V=num_records_dist_pdf(N,V) : V in 1..N].sort_down(2).printf_list, nl, println("CDF:"), [V=num_records_dist_cdf(N,V) : V in 1..N].printf_list, nl, println("Quantile:"), [Q=num_records_dist_quantile(N,Q) : Q in quantile_qs()].printf_list, nl, println("Mean:"=num_records_dist_mean(N)), println("Mean (h_n):"=num_records_dist_mean_h_n(N)), nl. go => true. /* * draw_without_replacement(10,1..10) n = 10 var : num records Probabilities: 3: 0.3199000000000000 2: 0.2815000000000000 4: 0.2015000000000000 1: 0.1001000000000000 5: 0.0771000000000000 6: 0.0174000000000000 7: 0.0020000000000000 8: 0.0004000000000000 9: 0.0001000000000000 mean = 2.9368 var : h_n Probabilities: 2.92897: 1.0000000000000000 mean = 2.92897 var : est Probabilities: 2.8798: 1.0000000000000000 mean = 2.8798 * normal_dist(100,15) n = 10 var : num records Probabilities: 3: 0.3366000000000000 2: 0.2792000000000000 4: 0.1948000000000000 1: 0.0948000000000000 5: 0.0745000000000000 6: 0.0174000000000000 7: 0.0023000000000000 8: 0.0004000000000000 mean = 2.9384 var : h_n Probabilities: 2.92897: 1.0000000000000000 mean = 2.92897 var : est Probabilities: 2.8798: 1.0000000000000000 mean = 2.8798 * uniform(0,1) n = 10 var : num records Probabilities: 3: 0.3211000000000000 2: 0.2883000000000000 4: 0.1965000000000000 1: 0.1005000000000000 5: 0.0741000000000000 6: 0.0166000000000000 7: 0.0027000000000000 9: 0.0001000000000000 8: 0.0001000000000000 mean = 2.9171 var : h_n Probabilities: 2.92897: 1.0000000000000000 mean = 2.92897 var : est Probabilities: 2.8798: 1.0000000000000000 mean = 2.8798 theoretical_mean = 2.92897 * poisson(1) Here we see how the theory doesn't work. For example, the mean is not about the estimated value of 2.92. Instead it's quite lower: about 2.21. n = 10 var : num records Probabilities: 2: 0.4757000000000000 3: 0.2979000000000000 1: 0.1792000000000000 4: 0.0464000000000000 5: 0.0008000000000000 mean = 2.2139 var : h_n Probabilities: 2.92897: 1.0000000000000000 mean = 2.92897 var : est Probabilities: 2.8798: 1.0000000000000000 mean = 2.8798 theoretical_mean = 2.92897 Here we see the distribution of the number of distict values in poisson(1) Picat> [poisson_dist_n(1,10).remove_dups.len : _ in 1..10000].show_freq Show freqs: 2 = 533 3 = 4676 4 = 4183 5 = 594 6 = 14 * random_integer(10) Same as for poisson(1) n = 10 var : num records Probabilities: 2: 0.3638000000000000 3: 0.3202000000000000 1: 0.1593000000000000 4: 0.1255000000000000 5: 0.0283000000000000 6: 0.0028000000000000 7: 0.0001000000000000 mean = 2.5085 var : h_n Probabilities: 2.92897: 1.0000000000000000 mean = 2.92897 var : est Probabilities: 2.8798: 1.0000000000000000 mean = 2.8798 theoretical_mean = 2.92897 * random_integer(100) This is more like it! However, there are duplicates so its mean is - still - off the theoretical mean. n = 10 var : num records Probabilities: 3: 0.3199000000000000 2: 0.2929000000000000 4: 0.1961000000000000 1: 0.1031000000000000 5: 0.0717000000000000 6: 0.0143000000000000 7: 0.0020000000000000 mean = 2.8913 var : h_n Probabilities: 2.92897: 1.0000000000000000 mean = 2.92897 var : est Probabilities: 2.8798: 1.0000000000000000 mean = 2.8798 theoretical_mean = 2.92897 */ go2 ?=> N = 10, println(n=N), reset_store, run_model(10_000,$model2(N),[show_probs_trunc,mean % , % show_percentiles, % show_hpd_intervals,hpd_intervals=[0.94], % show_histogram, % min_accepted_samples=1000,show_accepted_samples=true ]), println(theoretical_mean=num_records_dist_mean(N)), nl, % show_store_lengths,nl, % fail, nl. go2 => true. model2(N) => % Permutations and continuous distributions X = draw_without_replacement(N,1..N), % X = normal_dist_n(100,15,N), % X = uniform_dist_n(0,1,N), % discrete distributions % X = poisson_dist_n(1,N), % X = random_integer_dist_n(10,N), % X = random_integer_dist_n(100,N), RecordsIx = get_records_ix(X), NumRecords = RecordsIx.len, HN = harmonic_number(N), Est = log(N) + euler_gamma(), add("num records",NumRecords), add("h_n",HN), add("est",Est).