/* Baseball payroll (Resampling Stats) in Picat. From "Resampling Stats Illustrations" (http://www.statistics101.net/PeterBruce_05-illus.pdf) Page 45 """ Baseball payroll—hypothesis test for correlation using the Pearson correlation coefficient (program “baseball”) Is a baseball team’s performance directly related to its payroll? (In technical terms, is there a correlation between two variables, or are they independent?) Specifically, we want to know whether base- ball teams with high payrolls also tend to be the better performing teams. The following data are from the Washington Post, March 27, 1998, page F2, and were compiled by the Post according to the formula of the Player Relations Council. Performance is ranked by the teams’ won-loss records; note that good performance is denoted by a low rank number. TABLE 3. MAJOR LEAGUE PAYROLL AND WON-LOSS RANKS 1995–1997 Total Payroll Rank* NY Yankees 192.7 3 Baltimore 179.5 4 Atlanta 164.8 1 Cleveland 155.7 2 Chicago WS 150.3 14 Cincinnati 143 9. 5 Texas 139.9 11 Colorado 138.3 8 Toronto 137.4 25 St. Louis 137.3 19.5 Seattle 137.1 6 Boston 131.8 7 Los Angeles 128.3 5 San Francisco 124 18 Chicago Cubs 123 21 Florida 122.8 12 Anaheim 116 15.5 Houston 115.4 9.5 Philadelphia 109.9 26 San Diego 104.5 13 NY Mets 104.2 17 Kansas City 101.1 22 Minnesota 94.6 27 Oakland 85.5 23.5 Detroit 84 28 Milwaukee 78.5 19.5 Pittsburgh 67.7 23.5 Montreal 67.6 15.5 * Rank in games won and lost over the 3-year period ... The Pearson correlation coefficient for these data is -.71. Is that statistically significant? Might a correlation coefficient this negative have occurred just by chance? In other words, might there be no relationship between the two variables and the apparent correlation be the result of the “luck of the draw?” ... No shuffling produced a correlation coefficient as negative as the observed value, -.71, so we conclude that there is significant correlation between payroll and performance. The higher the payroll, the lower the rank number (i.e. the better the perfor- mance). """ Cf my Gamble model gamble_baseball_payroll.rkt This program was created by Hakan Kjellerstrand, hakank@gmail.com See also my Picat page: http://www.hakank.org/picat/ */ import ppl_distributions, ppl_utils. import util. main => go. /* Comparing resampled version using correlation coefficient Correlation coefficient of original data: -0.710856010357 var : r2 Probabilities (truncated): 0.309165285958009: 0.0002000000000000 0.306435286221428: 0.0002000000000000 0.286068051344564: 0.0002000000000000 0.250017686402023: 0.0002000000000000 ......... -0.585541838237709: 0.0001000000000000 -0.635558307095813: 0.0001000000000000 -0.67663761892153: 0.0001000000000000 -0.716689588741112: 0.0001000000000000 mean = -0.000962436 HPD intervals: HPD interval (0.95): -0.37572699006176..0.36430409642712 var : p Probabilities: 0: 0.9999000000000000 1: 0.0001000000000000 mean = 0.0001 HPD intervals: HPD interval (0.95): 0.00000000000000..0.00000000000000 */ go ?=> Payroll = [192.7,179.5,164.8,155.7,150.3,143,139.9,138.3,137.4,137.3,137.1,131.8,128.3,124, 123,122.8,116,115.4,109.9,104.5,104.2,101.1,94.6,85.5,84,78.5,67.7,67.6], Rank = [3,4,1,2,14,9.5,11,8,25,19.5,6,7,5,18,21,12,15.5,9.5,26, 13,17,22,27,23.5,28,19.5,23.5,15.5], CC = correlation_coefficient(Payroll,Rank), printf("Correlation coefficient of original data: %.12f\n",CC), reset_store, run_model(10_000,$model(Payroll,Rank,CC),[show_probs_trunc,mean,show_hpd_intervals,hpd_intervals=[0.95]]), nl, % show_store_lengths,nl, % fail, nl. go => true. model(Payroll,Rank,CC) => PayrollSample = draw_without_replacement(Payroll), RankSample = draw_without_replacement(Rank), R2 = correlation_coefficient(PayrollSample,RankSample), P = check1(R2 <= CC), % Probability that the sample coefficient is as low as the original add("r2",R2), add("p",P). /* From "Resampling Stats Illustrations" (http://www.statistics101.net/PeterBruce_05-illus.pdf) Page 45 """ Baseball payroll—testing for correlation using the sum-of-products statistic (program “basebal2”) We can also conduct this test with a statistic other than the correla- tion coefficient, the calculation of which is not wholly transparent. We will use the “sum of products” statistic. ... -> prob = .0001 """ sum_products_orig = 44858.7 var : sum products Probabilities (truncated): 50582.05000000000291: 0.0001500000000000 50580.0: 0.0001500000000000 50456.150000000008731: 0.0001500000000000 50419.80000000000291: 0.0001500000000000 ......... 45357.650000000001455: 0.0000500000000000 45262.0: 0.0000500000000000 45241.900000000008731: 0.0000500000000000 44737.899999999994179: 0.0000500000000000 mean = 49805.3 Percentiles: (0.001 46017.735850000005) (0.01 46752.99500000001) (0.025 47210.696249999994) (0.05 47621.164999999994) (0.1 48082.630000000012) (0.25 48894.199999999997) (0.5 49807.699999999997) (0.75 50716.612500000003) (0.84 51149.699999999997) (0.9 51541.335000000006) (0.95 52003.2575) (0.975 52373.467499999999) (0.99 52787.903000000013) (0.999 53651.267200000002) (0.9999 54266.953905000039) (0.99999 54369.200790000206) Histogram (total 20000) 44737.900: 1 (0.000 / 0.000) 44979.077: 0 (0.000 / 0.000) 45220.255: 2 (0.000 / 0.000) 45461.432: 3 (0.000 / 0.000) 45702.610: 4 (0.000 / 0.001) 45943.787: 12 # (0.001 / 0.001) 46184.965: 42 ## (0.002 / 0.003) 46426.142: 60 ### (0.003 / 0.006) 46667.320: 96 #### (0.005 / 0.011) 46908.497: 116 ##### (0.006 / 0.017) 47149.675: 232 ########## (0.012 / 0.028) 47390.852: 266 ########### (0.013 / 0.042) 47632.030: 396 ################# (0.020 / 0.061) 47873.207: 537 ####################### (0.027 / 0.088) 48114.385: 672 ############################# (0.034 / 0.122) 48355.562: 823 ################################### (0.041 / 0.163) 48596.740: 938 ######################################## (0.047 / 0.210) 48837.917: 1112 ################################################ (0.056 / 0.266) 49079.095: 1291 ####################################################### (0.065 / 0.330) 49320.272: 1332 ######################################################### (0.067 / 0.397) 49561.450: 1340 ########################################################## (0.067 / 0.464) 49802.627: 1391 ############################################################ (0.070 / 0.533) 50043.805: 1397 ############################################################ (0.070 / 0.603) 50284.982: 1290 ####################################################### (0.065 / 0.668) 50526.160: 1307 ######################################################## (0.065 / 0.733) 50767.337: 1105 ############################################### (0.055 / 0.788) 51008.515: 953 ######################################### (0.048 / 0.836) 51249.692: 802 ################################## (0.040 / 0.876) 51490.870: 675 ############################# (0.034 / 0.910) 51732.047: 543 ####################### (0.027 / 0.937) 51973.225: 408 ################## (0.020 / 0.957) 52214.402: 324 ############## (0.016 / 0.974) 52455.580: 205 ######### (0.010 / 0.984) 52696.757: 139 ###### (0.007 / 0.991) 52937.935: 87 #### (0.004 / 0.995) 53179.112: 53 ## (0.003 / 0.998) 53420.290: 21 # (0.001 / 0.999) 53661.467: 14 # (0.001 / 0.999) 53902.645: 4 (0.000 / 1.000) 54143.822: 7 (0.000 / 1.000) HPD intervals: HPD interval (0.95): 47180.75000000000000..52324.75000000000728 var : p Probabilities: 0: 0.9999500000000000 1: 0.0000500000000000 mean = 0.00005 Percentiles: (0.001 0) (0.01 0) (0.025 0) (0.05 0) (0.1 0) (0.25 0) (0.5 0) (0.75 0) (0.84 0) (0.9 0) (0.95 0) (0.975 0) (0.99 0) (0.999 0) (0.9999 0) (0.99999 0.80001000000265776) Histogram (total 20000) 0: 19999 ############################################################ (1.000 / 1.000) 1: 1 (0.000 / 1.000) HPD intervals: HPD interval (0.95): 0.00000000000000..0.00000000000000 */ go2 ?=> Payroll = [192.7,179.5,164.8,155.7,150.3,143,139.9,138.3,137.4,137.3,137.1,131.8,128.3,124, 123,122.8,116,115.4,109.9,104.5,104.2,101.1,94.6,85.5,84,78.5,67.7,67.6], Rank = [3,4,1,2,14,9.5,11,8,25,19.5,6,7,5,18,21,12,15.5,9.5,26, 13,17,22,27,23.5,28,19.5,23.5,15.5], SumProductsOrig = [P*R : {P,R} in zip(Payroll,Rank)].sum, println(sum_products_orig=SumProductsOrig), reset_store, run_model(20_000,$model2(Payroll,Rank,SumProductsOrig), [show_probs_trunc,mean,show_percentiles,show_histogram,show_hpd_intervals,hpd_intervals=[0.95]]), nl, % show_store_lengths,nl, % fail, nl. go2 => true. model2(Payroll,Rank,SumProductsOrig) => PayrollSample = draw_without_replacement(Payroll), RankSample = draw_without_replacement(Rank), SumProducts = [P*R : {P,R} in zip(PayrollSample,RankSample)].sum, % How often is the sum of products as low as in the original? P = check1(SumProducts <= SumProductsOrig), add("sum products",SumProducts), add("p",P).