/* 

  Order statistics - continuous distributions in Picat.

  From Siegrist "Probability Mathematical Statisics and Stochastic Processes",
  Chapter "6.6: Order Statistics"
  """
  Suppose that x is a real-valued variable for a population and that x = (x1,x2, … , xn ) 
  are the observed values of a sample of size n corresponding to this variable. 
  The order statistic of rank k is the k th smallest value in the data set, and is usually 
  denoted xn:k . To emphasize the dependence on the sample size, another common notation 
  is x(k) .
  """

  Wikipedia https://en.wikipedia.org/wiki/Order_statistic

  Note: For continuous distributions, it's harder to compare the
  the theoretical result and the model since we have to use some
  granularity to get the individial PDF values.
  Some some distributions, e.g. normal and larger values (such as
  normal 100 15) it works a little better when rounding the values
  in the model, but that might give some strange results...

  For discrete values, see
  - ppl_order_statistics_without_replacement_dist.pi
  - ppl_order_statistics_with_replacement_discrete_dist.pi


  Cf my Gamble model gamble_order_statistics_continuous_dist.rkt

  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

import ppl_distributions, ppl_utils.
import util.

main => go.

/*
  Normal distribution: normal_dist(100,15)
  - N: 10 number of measurements (increasing ordered)
  - K: 1 or 10: We check the K'th measurements

  Here we show in steps of 10.

  * First with K = 1, i.e. the smallest measurement.
    Here we see that the 50th percentile is 77.6 (with the precision of 0.1)

    Normal distribution
    Mu:100 Sigma:15 From:0 (By:10) To:200 N:10 K:1

    PDF = normal_dist_pdf(100,15)
      0: 0.00000000005940600
     10: 0.00000000405058853
     20: 0.00000017708671706
     30: 0.00000496396219814
     40: 0.00008919472231479
     50: 0.00102422242040757
     60: 0.00733939537132723
     70: 0.02926041134338440
     80: 0.04623234022717437
     90: 0.01551846337446857
    100: 0.00051945609427270
    110: 0.00000088827173908
    120: 0.00000000004777642
    130: 0.00000000000000006
    140: 0.00000000000000000
    150: 0.00000000000000000
    160: 0.00000000000000000
    170: 0.00000000000000000
    180: 0.00000000000000000
    190: 0.00000000000000000
    200: 0.00000000000000000

    CDF = normal_dist_cdf(100,15)
      0: 0.00000000013084922
     10: 0.00000000986588418
     20: 0.00000048213022939
     30: 0.00001530616193551
     40: 0.00031666728410192
     50: 0.00428232862865741
     60: 0.03765027282613776
     70: 0.20556895983294296
     80: 0.61573611712016507
     90: 0.94553025698539694
    100: 0.99902343750000022
    110: 0.99999894686154778
    120: 0.99999999996014499
    130: 0.99999999999999989
    140: 1.00000000000000000
    150: 1.00000000000000000
    160: 1.00000000000000000
    170: 1.00000000000000000
    180: 1.00000000000000000
    190: 1.00000000000000000
    200: 1.00000000000000000

    Quantile:
    0.0100000: 53.70000000000049312
    0.0250000: 58.00000000000055422
    0.0500000: 61.50000000000060396
    0.2500000: 71.50000000000021316
    0.5000000: 77.59999999999986642
    0.7500000: 83.09999999999955378
    0.8400000: 85.59999999999941167
    0.9750000: 92.49999999999901945
    0.9900000: 94.99999999999887734
    0.9990000: 99.99999999999859313
    0.9999000: 103.89999999999837144
    0.9999990: 110.09999999999801901

    Median
    77.6


  * Compare if we check K = 10, i.e. the largest measurement.
    Here we see that the 50th percentile is 122.5 (with precision of 0.1)

    Normal distribution
    Mu:100 Sigma:15 From:0 (By:10) To:200 N:10 K:10

    PDF = normal_dist_pdf(100,15)
      0: 0.00000000000000000
     10: 0.00000000000000000
     20: 0.00000000000000000
     30: 0.00000000000000000
     40: 0.00000000000000000
     50: 0.00000000000000000
     60: 0.00000000000000000
     70: 0.00000000000000006
     80: 0.00000000004777642
     90: 0.00000088827173908
    100: 0.00051945609427270
    110: 0.01551846337446857
    120: 0.04623234022717437
    130: 0.02926041134338440
    140: 0.00733939537132723
    150: 0.00102422242040757
    160: 0.00008919472231479
    170: 0.00000496396219814
    180: 0.00000017708671706
    190: 0.00000000405058853
    200: 0.00000000005940600

    CDF = normal_dist_cdf(100,15)
      0: 0.00000000000000000
     10: 0.00000000000000000
     20: 0.00000000000000000
     30: 0.00000000000000000
     40: 0.00000000000000000
     50: 0.00000000000000000
     60: 0.00000000000000000
     70: 0.00000000000000004
     80: 0.00000000003985498
     90: 0.00000105313845236
    100: 0.00097656250000000
    110: 0.05446974301460329
    120: 0.38426388287983487
    130: 0.79443104016705757
    140: 0.96234972717386169
    150: 0.99571767137134204
    160: 0.99968333271589804
    170: 0.99998469383806388
    180: 0.99999951786977115
    190: 0.99999999013411633
    200: 0.99999999986915133

    Quantile:
    0.0100000: 105.09999999999830322
    0.0250000: 107.59999999999816112
    0.0500000: 109.79999999999803606
    0.2500000: 116.99999999999762679
    0.5000000: 122.49999999999731415
    0.7500000: 128.59999999999698161
    0.8400000: 131.79999999999679972
    0.9750000: 142.09999999999621423
    0.9900000: 146.39999999999596980
    0.9990000: 155.79999999999543547
    0.9999000: 163.99999999999496936
    0.9999990: 177.99999999999417355

    Median
    122.5


*/
go ?=>
  Mu = 100,
  Sigma = 15,
  From = 0,
  To = 200,
  By = 10,
  N = 10, % We have N (increasing ordered) measurements
  member(K,[1,10]),
  % K = 1, % We want to check the value of the Kth measurement

  println("Normal distribution"),
  printf("Mu:%w Sigma:%w From:%w (By:%w) To:%w N:%w K:%w\n",Mu,Sigma,From,By,To,N,K),
  nl,
  CDF = $normal_dist_cdf(Mu,Sigma),
  PDF = $normal_dist_pdf(Mu,Sigma),
  println("PDF"=PDF),
  foreach(I in From..By..To)
    T = order_statistics_continuous_pdf(PDF,CDF,N,K,I),
    printf("%3d: %0.17f\n",I,T)
  end,
  nl,
  println("CDF"=CDF),
  foreach(I in From..By..To)
    T = order_statistics_continuous_cdf(CDF,N,K,I),
    printf("%3d: %0.17f\n",I,T)
  end,
  nl,  
  println("Quantile:"),
  Qs = [0.01,0.025,0.05,0.25,0.50,0.75,0.84,0.975,0.99,0.999,0.9999,0.999999],  
  foreach(Q in Qs)
    T = order_statistics_continuous_quantile(CDF,N,K,Q,0.1),
    printf("%.7f: %0.17f\n",Q,T)
  end,
  nl,
  println("Median"),
  println(order_statistics_continuous_median(CDF,N,K,0.1)),
  nl,
  fail,
  nl.
go => true.

/*
  Exponential distribution: lambda=0.01, N = 10, K = 1

Lambda:0.01 From:0 (By:1) To:20 N:10 K:1

CDF = exponential_dist_cdf(0.01)
  0: 0.00000000000000000
  1: 0.09516258196403997
  2: 0.18126924692201848
  3: 0.25918177931828229
  4: 0.32967995396436101
  5: 0.39346934028736652
  6: 0.45118836390597367
  7: 0.50341469620859025
  8: 0.55067103588277844
  9: 0.59343034025940067
 10: 0.63212055882855789
 11: 0.66712891630192062
 12: 0.69880578808779792
 13: 0.72746820696598713
 14: 0.75340303605839365
 15: 0.77686983985157032
 16: 0.79810348200534442
 17: 0.81731647594726531
 18: 0.83470111177841344
 19: 0.85043138077736502
 20: 0.86466471676338752

PDF = exponential_dist_pdf(0.01)
  0: 0.09999999999999962
  1: 0.09048374180359565
  2: 0.08187307530779783
  3: 0.07408182206817147
  4: 0.06703200460356365
  5: 0.06065306597126312
  6: 0.05488116360940243
  7: 0.04965853037914078
  8: 0.04493289641172197
  9: 0.04065696597405976
 10: 0.03678794411714407
 11: 0.03328710836980782
 12: 0.03011942119122008
 13: 0.02725317930340114
 14: 0.02465969639416057
 15: 0.02231301601484290
 16: 0.02018965179946546
 17: 0.01826835240527340
 18: 0.01652988882215859
 19: 0.01495686192226344
 20: 0.01353352832366122

Quantile:
0.0100000: 1.00000000000000000
0.0250000: 1.00000000000000000
0.0500000: 1.00000000000000000
0.2500000: 3.00000000000000000
0.5000000: 7.00000000000000000
0.7500000: 14.00000000000000000
0.8400000: 19.00000000000000000
0.9750000: 37.00000000000000000
0.9900000: 47.00000000000000000
0.9990000: 70.00000000000000000
0.9999000: 93.00000000000000000
0.9999990: 139.00000000000000000

Median
7


*/
go2 ?=>
  Lambda = 1/100,
  From = 0,
  To = 20,
  By = 1,
  N = 10, % We have N (increasing ordered) measurements
  K = 1, % We want to check the value of the Kth measurement

  println("Exponential distribution"),
  printf("Lambda:%w From:%w (By:%w) To:%w N:%w K:%w\n",Lambda,From,By,To,N,K),
  nl,
  CDF = $exponential_dist_cdf(Lambda),
  PDF = $exponential_dist_pdf(Lambda),
  println("CDF"=CDF),
  foreach(I in From..By..To)
    T = order_statistics_continuous_cdf(CDF,N,K,I),
    printf("%3d: %0.17f\n",I,T)
  end,
  nl,
  println("PDF"=PDF),
  foreach(I in From..By..To)
    T = order_statistics_continuous_pdf(PDF,CDF,N,K,I),
    printf("%3d: %0.17f\n",I,T)
  end,
  nl,
  println("Quantile:"),
  Qs = [0.01,0.025,0.05,0.25,0.50,0.75,0.84,0.975,0.99,0.999,0.9999,0.999999],  
  foreach(Q in Qs)
    T = order_statistics_continuous_quantile(CDF,N,K,Q),
    printf("%.7f: %0.17f\n",Q,T)
  end,
  nl,
  println("Median"),
  println(order_statistics_continuous_median(CDF,N,K)),
  nl,
  fail,

  nl.
go2 => true.


/*
  Simulation for 10 sorted measurements of normal_dist(100,15).
  We check K=1 (the smallest) and K=10 (the largest) measurements.

  k = 1
  var : m
  Probabilities (truncated):
  101.723023908632811: 0.0001000000000000
  101.357349670806514: 0.0001000000000000
  100.900423389350422: 0.0001000000000000
  100.843714591206236: 0.0001000000000000
  .........
  41.56495714422455: 0.0001000000000000
  40.111539867927775: 0.0001000000000000
  39.654061056234546: 0.0001000000000000
  38.952071423612196: 0.0001000000000000
  mean = 76.8456
  Percentiles:
  (0.001 44.090761523175559)
  (0.01 54.172740211338585)
  (0.025 58.08170627870355)
  (0.05 61.733880304045648)
  (0.1 65.363358989124407)
  (0.25 71.291354179851581)
  (0.5 77.41359608237228)
  (0.75 83.000323476281636)
  (0.84 85.43878548984631)
  (0.9 87.537104587381279)
  (0.95 90.139878362353414)
  (0.975 92.35277033931321)
  (0.99 94.674316768477254)
  (0.999 99.345245388858672)
  (0.9999 101.35738623823003)
  (0.99999 101.686460141593)
  HPD intervals:
  HPD interval (0.5): 72.52658618115967..83.97236324319972
  HPD interval (0.84): 65.10952943596413..89.09092234508401
  HPD interval (0.9): 62.89284547965602..91.09716843579342
  HPD interval (0.95): 58.65532338862543..92.57180035068299
  HPD interval (0.99): 53.07112830069623..97.60380512002473
  HPD interval (0.999): 43.80477495689069..100.90042338935042
  HPD interval (0.9999): 39.65406105623455..101.72302390863281
  HPD interval (0.99999): 38.95207142361220..101.72302390863281


  k = 10
  var : m
  Probabilities (truncated):
  164.661731161718365: 0.0001000000000000
  158.598660338279728: 0.0001000000000000
  157.341411132654144: 0.0001000000000000
  156.654767239790658: 0.0001000000000000
  .........
  99.077799871499494: 0.0001000000000000
  98.812156772558225: 0.0001000000000000
  98.474667260938773: 0.0001000000000000
  98.458766485746182: 0.0001000000000000
  mean = 123.188
  Percentiles:
  (0.001 100.37113370981785)
  (0.01 105.12398299780398)
  (0.025 107.56893189081703)
  (0.05 109.81927973694974)
  (0.1 112.49446616622484)
  (0.25 117.12416737663874)
  (0.5 122.53464375531441)
  (0.75 128.57506095982131)
  (0.84 131.69452971836529)
  (0.9 134.98999337765795)
  (0.95 138.93945484308611)
  (0.975 142.36625401540113)
  (0.99 146.09888312330531)
  (0.999 155.06593911593595)
  (0.9999 158.59926664535777)
  (0.99999 164.05548471009001)
  HPD intervals:
  HPD interval (0.5): 116.57456809220301..127.87491832411510
  HPD interval (0.84): 110.14580102424041..134.61629370382292
  HPD interval (0.9): 109.45163691589499..138.30548394635434
  HPD interval (0.95): 106.68033121377148..141.06330249551877
  HPD interval (0.99): 102.87215336719167..147.68666823441558
  HPD interval (0.999): 99.89194734931932..155.93298916016067
  HPD interval (0.9999): 98.45876648574618..158.59866033827973
  HPD interval (0.99999): 98.45876648574618..164.66173116171836

*/
go3 ?=>
  member(K,[1,10]),
  println(k=K),
  reset_store,
  run_model(10_000,$model3(K),[show_probs_trunc,mean,show_percentiles,show_hpd_intervals]),
  nl,
  fail,
  nl.
go3 => true.

model3(K) =>
  Mu = 100,
  Sigma = 15,
  N = 10, % we have N ordered measurements
  % K = 1, % we want to check the K'th value
  Xs = normal_dist_n(Mu,Sigma,N).sort,

  % M = Xs[K],
  M = ith_smallest(Xs,K,true),
  
  add("m",M).