/* 

  Records: kth difference in Picat.

  What are the probabilities of the kth differences in records?
  Note: Here we check the indices of the records, not the record values.

  The theoretical kth_diff_p(K,M) for K=1 seems to be correct,
  but not for K > 1.

  For K = 1, the probability only depends on the second value,
  since the first value (index) is always 1.
  For K > 1, then it depends on both the second and the third
  value, which the formula kth_diff_p(K,M) does not cover.

  Also, the probability of the Kth difference depends on that
  we actually have K+1 records. For K > 1, then the value of
  0 (i.e. when the number of records is < K + 1), dominates.
  Note: I tried to "observe away" these cases but the formula
  is still not correct.

  For permutations and continous distributions the distribution
  of kth difference seems to be quite stable for K = 1..3,
  * K = 1
    As mentioned above, it follows fairöly well the 
    theoretical formula.
    The mean is around 1.9

  * K = 2
    The mean is around 1.6

  * K = 3
    The mean is around 0.7


  However, for discrete distributions, this does not hold,
  e.g poisson(1), random_integer(10, i.e. when there are 
  large possibilities of duplicates. 
  random_integer(100) behaves more like the one above.

  Cf my Gamble model gamble_record_kth_difference.rkt

  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

import ppl_distributions, ppl_utils, ppl_common_utils.
import util.
% import ordset.

main => go.

/*
  I asked ChatGPT about the distribution of the (first) difference of the records.
  Distribution of General Record Time Differences
  P(DiffT(k) = m)
  For k=1 (the first difference):  1/((m+1)*m)
  But for k > 1, this is not correct.
  
*/
kth_diff_p(K,M) = Res =>
  if K == 1 then
    Res = 1 / ((M + 1) * M)
  elseif (( (K + M - 2) * (K + M - 3)) == 0) then
    printf("Division by 0: K: %d M: %d\n",K,M),
    Res = 0
  else
    Res = 1 / ( (K + M - 2) * (K + M - 3))
  end.


/*

  The first difference (K=1) seems to be correct, but not the second or third.

  The second and third differences are (just) the first shifted by K-1 steps.

  First difference:
   1 0.5 (1 / 2) 
   2 0.166666666666667 (1 / 6) 
   3 0.083333333333333 (1 / 12) 
   4 0.05 (1 / 20) 
   5 0.033333333333333 (1 / 30) 
   6 0.023809523809524 (1 / 42) 
   7 0.017857142857143 (1 / 56) 
   8 0.013888888888889 (1 / 72) 
   9 0.011111111111111 (1 / 90) 
  10 0.009090909090909 (1 / 110) 
  11 0.007575757575758 (1 / 132) 
  12 0.006410256410256 (1 / 156) 
  13 0.005494505494505 (1 / 182) 
  14 0.004761904761905 (1 / 210) 
  15 0.004166666666667 (1 / 240) 
  16 0.003676470588235 (1 / 272) 
  17 0.003267973856209 (1 / 306) 
  18 0.002923976608187 (1 / 342) 
  19 0.002631578947368 (1 / 380) 
  20 0.002380952380952 (1 / 420) 

  Second difference:
  Division by 0: K: 2 M: 1
   1 0 (0 / 1) 
   2 0.5 (1 / 2) 
   3 0.166666666666667 (1 / 6) 
   4 0.083333333333333 (1 / 12) 
   5 0.05 (1 / 20) 
   6 0.033333333333333 (1 / 30) 
   7 0.023809523809524 (1 / 42) 
   8 0.017857142857143 (1 / 56) 
   9 0.013888888888889 (1 / 72) 
  10 0.011111111111111 (1 / 90) 

  Third difference:
   1 0.5 (1 / 2) 
   2 0.166666666666667 (1 / 6) 
   3 0.083333333333333 (1 / 12) 
   4 0.05 (1 / 20) 
   5 0.033333333333333 (1 / 30) 
   6 0.023809523809524 (1 / 42) 
   7 0.017857142857143 (1 / 56) 
   8 0.013888888888889 (1 / 72) 
   9 0.011111111111111 (1 / 90) 
  10 0.009090909090909 (1 / 110) 


*/
go ?=>
  println("First difference:"),
  [M=kth_diff_p(1,M).and_rat : M in 1..20].printf_list,
  nl,
  println("Second difference:"),
  [M=kth_diff_p(2,M).and_rat : M in 1..10].printf_list,
  nl,
  println("Third difference:"),
  [M=kth_diff_p(3,M).and_rat : M in 1..10].printf_list,
  nl,
  println("Fourth difference:"),
  [M=kth_diff_p(4,M).and_rat : M in 1..10].printf_list,
  nl.
go => true.
  

/*

  The first difference (K=1) seems to 
  be OK, but not the second or third.

  * For N=10 draw_without_replacement(N). 

    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.4928000000000000
    2: 0.1672000000000000
    0: 0.0993000000000000
    3: 0.0881000000000000
    4: 0.0535000000000000
    5: 0.0323000000000000
    6: 0.0241000000000000
    7: 0.0179000000000000
    8: 0.0141000000000000
    9: 0.0107000000000000
    mean = 1.946

    Theoretical kth diff prob:
    1 0.5 (1 / 2) 
    2 0.166666666666667 (1 / 6) 
    3 0.083333333333333 (1 / 12) 
    4 0.05 (1 / 20) 
    5 0.033333333333333 (1 / 30) 
    6 0.023809523809524 (1 / 42) 
    7 0.017857142857143 (1 / 56) 
    8 0.013888888888889 (1 / 72) 
    9 0.011111111111111 (1 / 90) 

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.3757000000000000
    1: 0.2442000000000000
    2: 0.1363000000000000
    3: 0.0861000000000000
    4: 0.0616000000000000
    5: 0.0393000000000000
    6: 0.0262000000000000
    7: 0.0196000000000000
    8: 0.0110000000000000
    mean = 1.6004

    Theoretical kth diff prob:
    Division by 0: K: 2 M: 1
    1 0 (0 / 1) 
    2 0.5 (1 / 2) 
    3 0.166666666666667 (1 / 6) 
    4 0.083333333333333 (1 / 12) 
    5 0.05 (1 / 20) 
    6 0.033333333333333 (1 / 30) 
    7 0.023809523809524 (1 / 42) 
    8 0.017857142857143 (1 / 56) 
    9 0.013888888888889 (1 / 72) 

    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.7029000000000000
    1: 0.1089000000000000
    2: 0.0699000000000000
    3: 0.0467000000000000
    4: 0.0331000000000000
    5: 0.0190000000000000
    6: 0.0149000000000000
    7: 0.0046000000000000
    mean = 0.7378

    Theoretical kth diff prob:
    1 0.5 (1 / 2) 
    2 0.166666666666667 (1 / 6) 
    3 0.083333333333333 (1 / 12) 
    4 0.05 (1 / 20) 
    5 0.033333333333333 (1 / 30) 
    6 0.023809523809524 (1 / 42) 
    7 0.017857142857143 (1 / 56) 
    8 0.013888888888889 (1 / 72) 
    9 0.011111111111111 (1 / 90) 

  * normal_dist(100,15) N = 10
    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.4946000000000000
    2: 0.1698000000000000
    0: 0.0998000000000000
    3: 0.0876000000000000
    4: 0.0495000000000000
    5: 0.0332000000000000
    6: 0.0205000000000000
    7: 0.0179000000000000
    8: 0.0139000000000000
    9: 0.0132000000000000
    mean = 1.9393

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.3870000000000000
    1: 0.2496000000000000
    2: 0.1272000000000000
    3: 0.0804000000000000
    4: 0.0528000000000000
    5: 0.0415000000000000
    6: 0.0321000000000000
    7: 0.0186000000000000
    8: 0.0108000000000000
    mean = 1.5731

    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.7027000000000000
    1: 0.1097000000000000
    2: 0.0666000000000000
    3: 0.0453000000000000
    4: 0.0343000000000000
    5: 0.0234000000000000
    6: 0.0126000000000000
    7: 0.0054000000000000
    mean = 0.7464

  * exponential_dist(1)

    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.4977000000000000
    2: 0.1596000000000000
    0: 0.1036000000000000
    3: 0.0787000000000000
    4: 0.0500000000000000
    5: 0.0373000000000000
    6: 0.0273000000000000
    7: 0.0186000000000000
    8: 0.0151000000000000
    9: 0.0121000000000000
    mean = 1.9632

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.3802000000000000
    1: 0.2450000000000000
    2: 0.1310000000000000
    3: 0.0890000000000000
    4: 0.0548000000000000
    5: 0.0402000000000000
    6: 0.0284000000000000
    7: 0.0202000000000000
    8: 0.0112000000000000
    mean = 1.5956

    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.7094000000000000
    1: 0.1072000000000000
    2: 0.0688000000000000
    3: 0.0451000000000000
    4: 0.0307000000000000
    5: 0.0201000000000000
    6: 0.0122000000000000
    7: 0.0065000000000000
    mean = 0.7221

  * cauchy_dist(2,2)
    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.5040000000000000
    2: 0.1639000000000000
    0: 0.0963000000000000
    3: 0.0858000000000000
    4: 0.0530000000000000
    5: 0.0344000000000000
    6: 0.0252000000000000
    7: 0.0148000000000000
    8: 0.0134000000000000
    9: 0.0092000000000000
    mean = 1.918

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.3843000000000000
    1: 0.2387000000000000
    2: 0.1366000000000000
    3: 0.0852000000000000
    4: 0.0576000000000000
    5: 0.0390000000000000
    6: 0.0274000000000000
    7: 0.0194000000000000
    8: 0.0118000000000000
    mean = 1.5875

    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.7036000000000000
    1: 0.1068000000000000
    2: 0.0668000000000000
    3: 0.0500000000000000
    4: 0.0318000000000000
    5: 0.0209000000000000
    6: 0.0129000000000000
    7: 0.0072000000000000
    mean = 0.7499

  * poisson dist (10)

    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.4586000000000000
    2: 0.1697000000000000
    0: 0.1221000000000000
    3: 0.0843000000000000
    4: 0.0531000000000000
    5: 0.0367000000000000
    6: 0.0253000000000000
    7: 0.0219000000000000
    8: 0.0163000000000000
    9: 0.0120000000000000
    mean = 1.9903

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.4659000000000000
    1: 0.1920000000000000
    2: 0.1127000000000000
    3: 0.0771000000000000
    4: 0.0525000000000000
    5: 0.0394000000000000
    6: 0.0287000000000000
    7: 0.0206000000000000
    8: 0.0111000000000000
    mean = 1.4609

    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.8071000000000000
    1: 0.0604000000000000
    2: 0.0423000000000000
    3: 0.0330000000000000
    4: 0.0239000000000000
    5: 0.0198000000000000
    6: 0.0089000000000000
    7: 0.0046000000000000
    mean = 0.5242

  * random_integer(100)

    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.4964000000000000
    2: 0.1644000000000000
    0: 0.1076000000000000
    3: 0.0846000000000000
    4: 0.0501000000000000
    5: 0.0311000000000000
    6: 0.0222000000000000
    7: 0.0176000000000000
    8: 0.0144000000000000
    9: 0.0116000000000000
    mean = 1.9109

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.3963000000000000
    1: 0.2330000000000000
    2: 0.1258000000000000
    3: 0.0871000000000000
    4: 0.0581000000000000
    5: 0.0424000000000000
    6: 0.0270000000000000
    7: 0.0179000000000000
    8: 0.0124000000000000
    mean = 1.5768


    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.7216000000000000
    1: 0.0945000000000000
    2: 0.0650000000000000
    3: 0.0481000000000000
    4: 0.0328000000000000
    5: 0.0203000000000000
    6: 0.0122000000000000
    7: 0.0055000000000000
    mean = 0.7132

  * random_integer(10)

    [n = 10,k = 1]
    var : kth diff
    Probabilities:
    1: 0.4469000000000000
    0: 0.1607000000000000
    2: 0.1602000000000000
    3: 0.0856000000000000
    4: 0.0486000000000000
    5: 0.0305000000000000
    6: 0.0233000000000000
    7: 0.0168000000000000
    8: 0.0139000000000000
    9: 0.0135000000000000
    mean = 1.8611

    [n = 10,k = 2]
    var : kth diff
    Probabilities:
    0: 0.5232000000000000
    1: 0.1841000000000000
    2: 0.1019000000000000
    3: 0.0626000000000000
    4: 0.0473000000000000
    5: 0.0323000000000000
    6: 0.0257000000000000
    7: 0.0141000000000000
    8: 0.0088000000000000
    mean = 1.2497


    [n = 10,k = 3]
    var : kth diff
    Probabilities:
    0: 0.8376000000000000
    1: 0.0551000000000000
    2: 0.0372000000000000
    3: 0.0274000000000000
    4: 0.0198000000000000
    5: 0.0131000000000000
    6: 0.0067000000000000
    7: 0.0031000000000000
    mean = 0.4183


*/
go2 ?=>
  N = 10,
  member(K,1..3),
  println([n=N,k=K]),
  reset_store,
  run_model(10_000,$model2(N,K),[show_probs_trunc,mean % ,
                           % show_percentiles,
                           % show_hpd_intervals,hpd_intervals=[0.94],
                           % show_histogram,
                           % min_accepted_samples=1000,show_accepted_samples=true
                          ]),
  nl,
  % show_store_lengths,nl,
  fail,
  nl.
go2 => true.
  
  
model2(N,K) =>

  % Permutations
  X = draw_without_replacement(N,1..N),

  % Discrete
  % X = poisson_dist_n(10,N),
  % X = random_integer1_n(10,N),

  % However this is fairly according the the formula
  % X = random_integer1_n(100,N),
  

  % Continuous
  % X = normal_dist_n(100,15,N),
  % X = exponential_dist_n(1,N),
  % X = cauchy_dist_n(2,2,N),

  RecordsIx = get_records_ix(X),
  % RecordsIx = get_records_ix_geq(X), % Using >= instead of >
  NumRecords = RecordsIx.len,

  % KthDiff is 0 when the first element is N
  if NumRecords >= K+1 then
    % DiffsMean = differences(RecordsIx).mean,
    KthDiff = RecordsIx[K+1]-RecordsIx[K],
    % KValK = RecordsIx[K+1],        
  else
    % DiffsMean = 0,
    KthDiff = 0,
    % KValK = 0
  end,

  % add("num records",NumRecords),
  % add("k val k",KValK),
  add("kth diff",KthDiff).
  % add("diffs mean",DiffsMean).