#| 

  Coupon collector's problem in Racket Gamble.

  There are N different collector's cards hidden in a package, but we don't
  know which card there is in the package we buy.
  We want to collect all of them, how many packages must one buy to collect
  all the different cards?

  See https://en.wikipedia.org/wiki/Coupon_collector%27s_problem
  """
  In probability theory, the coupon collector's problem describes 'collect all coupons and win' 
  contests. It asks the following question: If each box of a brand of cereals contains a 
  coupon, and there are n different types of coupons, what is the probability that more 
  than t boxes need to be bought to collect all n coupons? 

  An alternative statement is: Given n coupons, how many coupons do you expect you need 
  to draw with replacement before having drawn each coupon at least once? The mathematical 
  analysis of the problem reveals that the expected number of trials needed grows as 
  Θ(n log(n).
  For example, when n = 50 it takes about 225[b] trials on average to collect all 50 coupons. 

  ...

  [b]: E(50) = 50(1 + 1/2 + 1/3 + ... + 1/50) = 224.9603, the expected number of trials to 
  collect all 50 coupons. 

   The approximation n*log(n) + γ*n + 1/2 for this expected number gives in this case 
   ≈ 195.6011 + 28.8608 + 0.5 ≈ 224.9619.  [log is the natural logarithm]
  """ 

  This model uses set! which simplifies the model. Nice that Gamble supports this!

  But:
  - enumerate does not work, since it yield incorrect result. 
  - it's not fast (e.g. n=100 25.9s)


  * For n=10

(theoretical 29.28968253968254 approx 29.29800757894046)
var : num-runs2
20: 0.04999999999999999
26: 0.04599999999999999
25: 0.04399999999999999
29: 0.04399999999999999
19: 0.04299999999999999
22: 0.04299999999999999
21: 0.041999999999999996
24: 0.041999999999999996
27: 0.03799999999999999
23: 0.03699999999999999
...
69: 0.0009999999999999998
70: 0.0009999999999999998
72: 0.0009999999999999998
75: 0.0009999999999999998
11: 0.0009999999999999998
76: 0.0009999999999999998
77: 0.0009999999999999998
78: 0.0009999999999999998
50: 0.0009999999999999998
55: 0.0009999999999999998
mean: 29.207999999999984
Min: 11 Mean: 29.009 Max: 119 Variance: 133.372919 Stddev: 11.548719366232778
Credible interval (0.84): 15..40


  * For n=100

(theoretical 361665906988008779005537951077603286192775/697203752297124771645338089353123035568 518.737751763962 approx 518.7385850888091)

var : num-runs2
416: 0.008999999999999994
562: 0.007999999999999995
446: 0.007999999999999995
493: 0.007999999999999995
492: 0.007999999999999995
507: 0.007999999999999995
531: 0.006999999999999996
457: 0.006999999999999996
462: 0.006999999999999996
481: 0.006999999999999996
...
962: 0.0009999999999999994
455: 0.0009999999999999994
463: 0.0009999999999999994
985: 0.0009999999999999994
989: 0.0009999999999999994
480: 0.0009999999999999994
1006: 0.0009999999999999994
494: 0.0009999999999999994
499: 0.0009999999999999994
505: 0.0009999999999999994
mean: 518.4699999999995


  This program was created by Hakan Kjellerstrand, hakank@gmail.com
  See also my Racket page: http://www.hakank.org/racket/

|#

#lang gamble

; (require gamble/viz)
(require racket)
(require "gamble_utils.rkt")

;"Exact" probability
; from https://en.wikipedia.org/wiki/Coupon_collector%27s_problem (footnote [b])
(define (cc-theoretical n)
  ; approx (+ (* n (log n)) (* gamma n) 1/2  )
  (* n (for/sum ([i (range n)]) 
         (/ (+ i 1))))
  )

; Approximate probability
; n * log n + n*eulers_gamma + 1/2
(define (cc-approx n)
  (define eulers-gamma 0.5772156649)
  (+ (* n (log n)) (* n eulers-gamma) 1/2)
  )

  
(define (coupon-collectors-problem n)
  (let ([theo (cc-theoretical n)]
        [approx (cc-approx n)]
        )
    (displayln (list "theoretical"  (exact->inexact theo) "approx" approx))
    )
  (; enumerate ; incorrect value!
   ; rejection-sampler
   importance-sampler
   ; mh-sampler

   (define slots (make-vector n 0))

   ; Are all slots filled, i.e. > 0?
   (define (all-filled) (andmap (lambda (i)                          
                                  (> (vector-ref slots i) 0)) (range n)))

   ; Update (vector-set!) the count for a random slot.
   (define (run) (for/last ([i (in-naturals 1)]
                            #:break (all-filled))
     (let ([pos (random-integer n)])
       (vector-set! slots pos (add1 (vector-ref slots pos)))
       i ; This is the i'th run
       )
     ))
   
   (define num-runs (run))

   ; This is not really needed since (run) breaks when
   ; all slots are filled
   (observe/fail (all-filled))
   
   (list num-runs)

   )
  )


(time (show-marginals (coupon-collectors-problem 10)
                (list "num-runs")
                #:num-samples 1000
                #:truncate-output 10
                ; #:show-stats? #t 
                ; #:credible-interval 0.84
                ))