/* 

  Baseball payroll (Resampling Stats) in Picat.

  From "Resampling Stats Illustrations" (http://www.statistics101.net/PeterBruce_05-illus.pdf)
  Page 45
  """
  Baseball payroll—hypothesis test for correlation using the Pearson
  correlation coefficient
  (program “baseball”)
  Is a baseball team’s performance directly related to its payroll? (In
  technical terms, is there a correlation between two variables, or are
  they independent?) Specifically, we want to know whether base-
  ball teams with high payrolls also tend to be the better performing
  teams.
  The following data are from the Washington Post, March 27, 1998,
  page F2, and were compiled by the Post according to the formula
  of the Player Relations Council. Performance is ranked by the
  teams’ won-loss records; note that good performance is denoted by
  a low rank number.

  TABLE 3. MAJOR LEAGUE PAYROLL AND WON-LOSS RANKS 1995–1997
  Total          Payroll    Rank*
  NY Yankees     192.7        3
  Baltimore      179.5        4
  Atlanta        164.8        1
  Cleveland      155.7        2
  Chicago WS     150.3       14
  Cincinnati     143 9.       5
  Texas          139.9       11
  Colorado       138.3        8
  Toronto        137.4       25
  St. Louis      137.3       19.5
  Seattle        137.1        6
  Boston         131.8        7
  Los Angeles    128.3        5
  San Francisco  124         18
  Chicago Cubs   123         21
  Florida        122.8       12
  Anaheim        116         15.5
  Houston        115.4        9.5
  Philadelphia   109.9       26
  San Diego      104.5       13
  NY Mets        104.2       17
  Kansas City    101.1       22
  Minnesota       94.6       27
  Oakland         85.5       23.5
  Detroit         84         28
  Milwaukee       78.5       19.5
  Pittsburgh      67.7       23.5
  Montreal        67.6       15.5
  * Rank in games won and lost over the 3-year period

  ...
  The Pearson correlation coefficient for these data is -.71. Is that statistically 
  significant? Might a correlation coefficient this negative have occurred
  just by chance? In other words, might there be no relationship
  between the two variables and the apparent correlation be the
  result of the “luck of the draw?”

  ...
  No shuffling produced a correlation coefficient as negative as the
  observed value, -.71, so we conclude that there is significant
  correlation between payroll and performance. The higher the
  payroll, the lower the rank number (i.e. the better the perfor-
  mance).
  """

  Cf my Gamble model gamble_baseball_payroll.rkt

  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Picat page: http://www.hakank.org/picat/

*/

import ppl_distributions, ppl_utils.
import util.

main => go.

/*
  Comparing resampled version using correlation coefficient

  Correlation coefficient of original data: -0.710856010357
  var : r2
  Probabilities (truncated):
  0.309165285958009: 0.0002000000000000
  0.306435286221428: 0.0002000000000000
  0.286068051344564: 0.0002000000000000
  0.250017686402023: 0.0002000000000000
  .........
  -0.585541838237709: 0.0001000000000000
  -0.635558307095813: 0.0001000000000000
  -0.67663761892153: 0.0001000000000000
  -0.716689588741112: 0.0001000000000000
  mean = -0.000962436
  HPD intervals:
  HPD interval (0.95): -0.37572699006176..0.36430409642712

  var : p
  Probabilities:
  0: 0.9999000000000000
  1: 0.0001000000000000
  mean = 0.0001
  HPD intervals:
  HPD interval (0.95): 0.00000000000000..0.00000000000000


*/
go ?=>
  Payroll = [192.7,179.5,164.8,155.7,150.3,143,139.9,138.3,137.4,137.3,137.1,131.8,128.3,124,
             123,122.8,116,115.4,109.9,104.5,104.2,101.1,94.6,85.5,84,78.5,67.7,67.6],
  Rank    = [3,4,1,2,14,9.5,11,8,25,19.5,6,7,5,18,21,12,15.5,9.5,26,
             13,17,22,27,23.5,28,19.5,23.5,15.5],


  CC = correlation_coefficient(Payroll,Rank),
  printf("Correlation coefficient of original data: %.12f\n",CC),

  reset_store,
  run_model(10_000,$model(Payroll,Rank,CC),[show_probs_trunc,mean,show_hpd_intervals,hpd_intervals=[0.95]]),
  nl,
  % show_store_lengths,nl,
  % fail,
  nl.
go => true.
  
  
model(Payroll,Rank,CC) =>
  PayrollSample = draw_without_replacement(Payroll),
  RankSample    = draw_without_replacement(Rank),

  R2 = correlation_coefficient(PayrollSample,RankSample),
  P = check1(R2 <= CC), % Probability that the sample coefficient is as low as the original
  
  add("r2",R2),
  add("p",P).


/*

  From "Resampling Stats Illustrations" (http://www.statistics101.net/PeterBruce_05-illus.pdf)
  Page 45
  """
  Baseball payroll—testing for correlation using the sum-of-products statistic
  (program “basebal2”)
  We can also conduct this test with a statistic other than the correla-
  tion coefficient, the calculation of which is not wholly transparent.
  We will use the “sum of products” statistic.
  ...
  -> 
  prob = .0001
  """

sum_products_orig = 44858.7
var : sum products
Probabilities (truncated):
50582.05000000000291: 0.0001500000000000
50580.0: 0.0001500000000000
50456.150000000008731: 0.0001500000000000
50419.80000000000291: 0.0001500000000000
.........
45357.650000000001455: 0.0000500000000000
45262.0: 0.0000500000000000
45241.900000000008731: 0.0000500000000000
44737.899999999994179: 0.0000500000000000
mean = 49805.3
Percentiles:
(0.001 46017.735850000005)
(0.01 46752.99500000001)
(0.025 47210.696249999994)
(0.05 47621.164999999994)
(0.1 48082.630000000012)
(0.25 48894.199999999997)
(0.5 49807.699999999997)
(0.75 50716.612500000003)
(0.84 51149.699999999997)
(0.9 51541.335000000006)
(0.95 52003.2575)
(0.975 52373.467499999999)
(0.99 52787.903000000013)
(0.999 53651.267200000002)
(0.9999 54266.953905000039)
(0.99999 54369.200790000206)
Histogram (total 20000)
44737.900:     1                                                              (0.000 / 0.000)
44979.077:     0                                                              (0.000 / 0.000)
45220.255:     2                                                              (0.000 / 0.000)
45461.432:     3                                                              (0.000 / 0.000)
45702.610:     4                                                              (0.000 / 0.001)
45943.787:    12 #                                                            (0.001 / 0.001)
46184.965:    42 ##                                                           (0.002 / 0.003)
46426.142:    60 ###                                                          (0.003 / 0.006)
46667.320:    96 ####                                                         (0.005 / 0.011)
46908.497:   116 #####                                                        (0.006 / 0.017)
47149.675:   232 ##########                                                   (0.012 / 0.028)
47390.852:   266 ###########                                                  (0.013 / 0.042)
47632.030:   396 #################                                            (0.020 / 0.061)
47873.207:   537 #######################                                      (0.027 / 0.088)
48114.385:   672 #############################                                (0.034 / 0.122)
48355.562:   823 ###################################                          (0.041 / 0.163)
48596.740:   938 ########################################                     (0.047 / 0.210)
48837.917:  1112 ################################################             (0.056 / 0.266)
49079.095:  1291 #######################################################      (0.065 / 0.330)
49320.272:  1332 #########################################################    (0.067 / 0.397)
49561.450:  1340 ##########################################################   (0.067 / 0.464)
49802.627:  1391 ############################################################ (0.070 / 0.533)
50043.805:  1397 ############################################################ (0.070 / 0.603)
50284.982:  1290 #######################################################      (0.065 / 0.668)
50526.160:  1307 ########################################################     (0.065 / 0.733)
50767.337:  1105 ###############################################              (0.055 / 0.788)
51008.515:   953 #########################################                    (0.048 / 0.836)
51249.692:   802 ##################################                           (0.040 / 0.876)
51490.870:   675 #############################                                (0.034 / 0.910)
51732.047:   543 #######################                                      (0.027 / 0.937)
51973.225:   408 ##################                                           (0.020 / 0.957)
52214.402:   324 ##############                                               (0.016 / 0.974)
52455.580:   205 #########                                                    (0.010 / 0.984)
52696.757:   139 ######                                                       (0.007 / 0.991)
52937.935:    87 ####                                                         (0.004 / 0.995)
53179.112:    53 ##                                                           (0.003 / 0.998)
53420.290:    21 #                                                            (0.001 / 0.999)
53661.467:    14 #                                                            (0.001 / 0.999)
53902.645:     4                                                              (0.000 / 1.000)
54143.822:     7                                                              (0.000 / 1.000)
HPD intervals:
HPD interval (0.95): 47180.75000000000000..52324.75000000000728

var : p
Probabilities:
0: 0.9999500000000000
1: 0.0000500000000000
mean = 0.00005
Percentiles:
(0.001 0)
(0.01 0)
(0.025 0)
(0.05 0)
(0.1 0)
(0.25 0)
(0.5 0)
(0.75 0)
(0.84 0)
(0.9 0)
(0.95 0)
(0.975 0)
(0.99 0)
(0.999 0)
(0.9999 0)
(0.99999 0.80001000000265776)
Histogram (total 20000)
      0: 19999 ############################################################ (1.000 / 1.000)
      1:     1                                                              (0.000 / 1.000)
HPD intervals:
HPD interval (0.95): 0.00000000000000..0.00000000000000


*/

go2 ?=>
  Payroll = [192.7,179.5,164.8,155.7,150.3,143,139.9,138.3,137.4,137.3,137.1,131.8,128.3,124,
             123,122.8,116,115.4,109.9,104.5,104.2,101.1,94.6,85.5,84,78.5,67.7,67.6],
  Rank    = [3,4,1,2,14,9.5,11,8,25,19.5,6,7,5,18,21,12,15.5,9.5,26,
             13,17,22,27,23.5,28,19.5,23.5,15.5],
  SumProductsOrig = [P*R : {P,R} in zip(Payroll,Rank)].sum,
  println(sum_products_orig=SumProductsOrig),
  reset_store,
  run_model(20_000,$model2(Payroll,Rank,SumProductsOrig),
                   [show_probs_trunc,mean,show_percentiles,show_histogram,show_hpd_intervals,hpd_intervals=[0.95]]),
  nl,
  % show_store_lengths,nl,
  % fail,
  nl.
go2 => true.
  
  
model2(Payroll,Rank,SumProductsOrig) =>
  PayrollSample = draw_without_replacement(Payroll),
  RankSample    = draw_without_replacement(Rank),

  SumProducts = [P*R : {P,R} in zip(PayrollSample,RankSample)].sum,
  % How often is the sum of products as low as in the original?

  P = check1(SumProducts <= SumProductsOrig),

  add("sum products",SumProducts),
  add("p",P).