#| Chinese restaurant process in Racket/Gamble https://en.wikipedia.org/wiki/Chinese_restaurant_process """ An equivalent, but subtly different way to define the Chinese restaurant process, is to let new customers choose companions rather than tables.[4] Customer n+1 chooses to sit at the same table as any one of the n seated customers with probability 1/(n+theta) or chooses to sit at a new, unoccupied table with probability theta/(n + theta) Notice that in this formulation, the customer chooses a table without having to count table occupancies- we don't need |b|. ... It can be understood as the sum of n independent Bernoulli random variables, each with a different parameter: K = sum(bi,i=1..n) bi ~ Bernoulli theta / (i - 1 + theta) PMF: gamma(theta) ------------ * |s(n,k)*theta^k k = 1..N gamma(n + theta) where s denotes Stirling numberf of the first kind """ Note: The model below supports two variants: - sort-tables? #t: This means that the tables are sort to break symmetries. It make sense if one interpreted this as the customer is seated by an already occupied table (1 / (n + theta)). Using this option, the theoretical value of K is seem directly as probability on K filled tables (b). - sort-tables? #f: Here the tables are not sorted. For this option, it's harder to related the tables (b) and the probabilities. * For 10 (n) guests and theta 0.5, using enumerate. Note, the "tables" are sorted to break symmetries. variable : b (1 1 0 0 0 0 0 0 0 0): 29200384/72747675 (0.40139267681063345) (1 0 0 0 0 0 0 0 0 0): 65536/230945 (0.2837731927515209) (1 1 1 0 0 0 0 0 0 0): 667136/2909907 (0.2292636843720435) (1 1 1 1 0 0 0 0 0 0): 9263104/130945815 (0.07073997744792379) (1 1 1 1 1 0 0 0 0 0): 32/2431 (0.01316330728095434) (1 1 1 1 1 1 0 0 0 0): 48208/31177575 (0.0015462395648154163) (1 1 1 1 1 1 1 0 0 0): 16/138567 (0.00011546760772766965) (1 1 1 1 1 1 1 1 0 0): 232/43648605 (5.3151755938133645e-6) (1 1 1 1 1 1 1 1 1 0): 2/14549535 (1.3746143777103528e-7) (1 1 1 1 1 1 1 1 1 1): 1/654729075 (1.5273493085670588e-9) (0 0 0 0 0 0 0 0 0 0): 0 (0.0) variable : K 2: 29200384/72747675 (0.40139267681063345) 1: 65536/230945 (0.2837731927515209) 3: 667136/2909907 (0.2292636843720435) 4: 9263104/130945815 (0.07073997744792379) 5: 32/2431 (0.01316330728095434) 6: 48208/31177575 (0.0015462395648154163) 7: 16/138567 (0.00011546760772766965) 8: 232/43648605 (5.3151755938133645e-6) 9: 2/14549535 (1.3746143777103528e-7) 10: 1/654729075 (1.5273493085670588e-9) 0: 0 (0.0) mean: 31037876/14549535 (2.133255530159555) * An alternative is to not sort the tables variable : b (1 0 0 0 0 0 0 0 0 0): 65536/230945 (0.2837731927515209) (1 1 0 0 0 0 0 0 0 0): 32768/230945 (0.14188659637576045) (1 0 1 0 0 0 0 0 0 0): 16384/230945 (0.07094329818788023) (1 0 0 1 0 0 0 0 0 0): 32768/692835 (0.04729553212525349) (1 1 1 0 0 0 0 0 0 0): 8192/230945 (0.035471649093940114) (1 0 0 0 1 0 0 0 0 0): 8192/230945 (0.035471649093940114) (1 0 0 0 0 1 0 0 0 0): 32768/1154725 (0.028377319275152094) (1 1 0 1 0 0 0 0 0 0): 16384/692835 (0.023647766062626745) (1 0 0 0 0 0 1 0 0 0): 16384/692835 (0.023647766062626745) (1 0 0 0 0 0 0 1 0 0): 32768/1616615 (0.02026951376796578) (1 1 0 0 1 0 0 0 0 0): 4096/230945 (0.017735824546970057) ... (1 0 0 1 1 1 1 1 1 1): 8/654729075 (1.221879446853647e-8) (1 1 1 0 1 1 1 1 1 1): 2/218243025 (9.164095851402353e-9) (1 1 0 1 1 1 1 1 1 1): 4/654729075 (6.109397234268235e-9) (1 0 1 1 1 1 1 1 1 1): 2/654729075 (3.0546986171341176e-9) (1 1 1 1 1 1 1 1 1 1): 1/654729075 (1.5273493085670588e-9) (0 0 0 0 0 0 0 0 0 0): 0 (0.0) variable : K 2: 29200384/72747675 (0.40139267681063345) 1: 65536/230945 (0.2837731927515209) 3: 667136/2909907 (0.2292636843720435) 4: 9263104/130945815 (0.07073997744792379) 5: 32/2431 (0.01316330728095434) 6: 48208/31177575 (0.0015462395648154163) 7: 16/138567 (0.00011546760772766965) 8: 232/43648605 (5.3151755938133645e-6) 9: 2/14549535 (1.3746143777103528e-7) 10: 1/654729075 (1.5273493085670588e-9) 0: 0 (0.0) mean: 31037876/14549535 (2.133255530159555) * For n=100, theta 0.1 there's just a few tables needed when the tables are sorted. Here using importance-sampler (10000 samples) variable : b (1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0): 0.60212 (1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0): 0.31025 (1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0): 0.07525 (1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0): 0.01118 (1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0): 0.00112 (1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0): 8e-5 variable : K 1: 0.60212 2: 0.31025 3: 0.07525 4: 0.01118 5: 0.00112 6: 8e-5 mean: 1.4991700000000003 The quantile 0.99999 is 6, i.e. 6 tables. The quantile 0.999999 is 7. See below for probability distribution of K (pdf, cdf, quantile, and mean) Also, see gamble_chinese_restaurant_process.rkt for an alternative model but that model has different probability "dynamics". This program was created by Hakan Kjellerstrand, hakank@gmail.com See also my Racket page: http://www.hakank.org/racket/ |# #lang gamble ; (require gamble/viz) (require racket) (require "gamble_utils.rkt") (require "gamble_distributions.rkt") (define (model [sort-tables? #t]) (show "sort-tables?" sort-tables?) (enumerate ; rejection-sampler ; importance-sampler ; mh-sampler (define max_guests 10) (define theta 5/10) (define b (let ([t (for/list ([i max_guests]) (bernoulli (/ theta (+ (add1 i) -1 theta))))]) (if sort-tables? (reverse (sort t <)) t))) ; Number of filled tables (define K (sum b)) (list b K ) ) ) (for ([sort-tables? '(#t)]) (show-marginals (model sort-tables?) (list "b" "K" ) #:num-samples 100000 ; #:truncate-output 5 ; #:skip-marginals? #t ; #:show-stats? #t ; #:credible-interval 0.84 ; #:hpd-interval (list 0.84) ; #:show-histogram? #t2 ; #:show-percentiles? #t ; #:burn 0 ; #:thin 0 ) ) #| PDF, CDF, Quantile, and mean of (crp 0.1 10) PDF crp theta: 0.5 n: 10 crp_pdf 0.5 10 0: 0 crp_pdf 0.5 10 1: 0.2837731927515211 crp_pdf 0.5 10 2: 0.4013926768106336 crp_pdf 0.5 10 3: 0.22926368437204359 crp_pdf 0.5 10 4: 0.0707399774479238 crp_pdf 0.5 10 5: 0.013163307280954346 crp_pdf 0.5 10 6: 0.001546239564815417 crp_pdf 0.5 10 7: 0.00011546760772766971 crp_pdf 0.5 10 8: 5.315175593813367e-6 crp_pdf 0.5 10 9: 1.3746143777103534e-7 crp_pdf 0.5 10 10: 1.5273493085670594e-9 CDF crp_cdf 0.5 10 0: 0 crp_cdf 0.5 10 1: 0.2837731927515211 crp_cdf 0.5 10 2: 0.6851658695621548 crp_cdf 0.5 10 3: 0.9144295539341983 crp_cdf 0.5 10 4: 0.9851695313821222 crp_cdf 0.5 10 5: 0.9983328386630765 crp_cdf 0.5 10 6: 0.9998790782278919 crp_cdf 0.5 10 7: 0.9999945458356195 crp_cdf 0.5 10 8: 0.9999998610112133 crp_cdf 0.5 10 9: 0.9999999984726511 crp_cdf 0.5 10 10: 1.0000000000000004 Quantile crp_quantile 0.5 10 0.001: 1 crp_quantile 0.5 10 0.01: 1 crp_quantile 0.5 10 0.02: 1 crp_quantile 0.5 10 0.05: 1 crp_quantile 0.5 10 0.1: 1 crp_quantile 0.5 10 0.25: 1 crp_quantile 0.5 10 0.5: 2 crp_quantile 0.5 10 0.75: 3 crp_quantile 0.5 10 0.9: 3 crp_quantile 0.5 10 0.95: 4 crp_quantile 0.5 10 0.98: 4 crp_quantile 0.5 10 0.99: 5 crp_quantile 0.5 10 0.999: 6 crp_quantile 0.5 10 0.9999: 7 crp_quantile 0.5 10 0.99999: 7 crp_quantile 0.5 10 0.999999: 8 crp_mean 0.5 10: 2.1332555301595546 0.99999999 quantile for different n (theta 0.5) crp_quantile 0.5 3 0.99999999: 3 crp_quantile 0.5 6 0.99999999: 6 crp_quantile 0.5 10 0.99999999: 9 crp_quantile 0.5 20 0.99999999: 11 crp_quantile 0.5 50 0.99999999: 13 crp_quantile 0.5 100 0.99999999: 15 crp_quantile 0.5 150 0.99999999: 15 |# (displayln "\nPDF, CDF, quantile, mean") (define qs '(0.001 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 0.999 0.9999 0.99999 0.999999 0.9999999 0.99999999 0.999999999)) (let* ([theta 1/2] [n 10] ; The probability of higher values is very small [max-k (+ 3 (crp_quantile theta n 0.999999999))]) (displayln (format "crp theta: ~a n: ~a" theta n)) (newline) (displayln "PDF") (for ([k max-k]) (let ([v (crp_pdf theta n k)]) (displayln (format "crp_pdf ~a ~a ~a: ~a (~a)" theta n k v (* 1.0 v))) )) (newline) (displayln "CDF") (for ([k max-k]) (let ([v (crp_cdf theta n k)]) (displayln (format "crp_cdf ~a ~a ~a: ~a (~a)" theta n k v (* 1.0 v))) )) (newline) (displayln "Quantile") (for ([q qs]) (displayln (format "crp_quantile ~a ~a ~a: ~a" theta n q (crp_quantile theta n q))) ) (newline) (displayln (format "crp_mean ~a ~a: ~a" theta n (crp_mean theta n))) ) (newline) (displayln "\n0.99999999 quantile for different n (theta 0.5)") (for ([n '(3 6 10 20 50 100 150)]) (displayln (format "crp_quantile ~a ~a ~a: ~a" 0.5 n 0.99999999 (crp_quantile 0.5 n 0.99999999))) ) (newline) #| crp distribution crp 1/2 10 variable : c 2: 0.4077 1: 0.2849 3: 0.223 4: 0.07 5: 0.0129 6: 0.0014 7: 0.0001 mean: 2.1229000000000005 |# (displayln "\nModel 2 crp dist") (define (model2) (; enumerate ; rejection-sampler importance-sampler ; mh-sampler (define theta 1/2) (define n 10) (define c (crp theta n)) (list c ) ) ) (show-marginals (model2) (list "c" ) #:num-samples 10000 ; #:truncate-output 5 ; #:skip-marginals? #t ; #:show-stats? #t ; #:credible-interval 0.84 ; #:hpd-interval (list 0.84) ; #:show-histogram? #t2 ; #:show-percentiles? #t ; #:burn 0 ; #:thin 0 )