Picat PPL - A Lightweight Probabilistic Programming Toolkit

Picat PPL (Probabilistic Programming Light) is a lightweight probabilistic programming framework implemented entirely in Picat. It is designed for exploring uncertainty, solving probability puzzles, and experimenting with small-scale probabilistic models - rather than performing heavy Bayesian data analysis.

Picat PPL was created to be able to use Picat for two use cases: Picat PPL's syntax is intentionally similar to that of Gamble, WebPPL, and Turing.jl (see my Gamble models, WebPPL models, and Turing.jl models), offering an intuitive and declarative style while staying true to Picat's expressive logic-functional foundation. The supported function be evaluated interactively in Picat's REPL, or used as components in a (probabilistic) model, or integrated in other Picat programs.

The inference engine in Picat PPL is based on simple rejection sampling. As a result, probabilities computed within probabilistic models are approximate and depend on the number of generated samples. In contrast, the probability distribution functions (such as the PDF, CDF, quantile, mean, and variance functions) yield exact analytical results wherever possible. This distinction makes the system both educational and precise: exact for pure distribution analysis, and stochastic for probabilistic reasoning.

Picat PPL provides a comprehensive collection of probability distributions, both discrete and continuous, each supporting: These functions can be used independently or composed within probabilistic models, allowing for clear experimentation and exploration of probabilistic concepts.

In essence, Picat PPL brings a transparent, compact, and fully self-contained probabilistic layer to Picat - ideal for simulation, reasoning, and curiosity-driven research into probability and uncertainty.

Note and some little background of Picat PPL: There is actually a probabilistic programming system already in Picat: PicatPRISM, which is a Picat port of one of the earlier probabilistic programming systems PRISM created by Taisuke Sato's and implemented in Neng-Fa Zhou's B-Prolog system. Neng-Fa was involved in the development of PRISM as well.

I've tested PicatPRISM (as well as PRISM), but realized that I missed too many features from the PPL systems I'm used to (Gamble, WebPPL, Turing.jl, BLOG, etc), so I decided to roll my own. Hence Picat PPL.

The main modules in Picat PPL are: These modules are planned to be continiously improved with more utilities and probability distributions.

Exact probability distributions

Picat PPL contains (in ppl_distributions.pi) quite a few probability distributions for calculating exact probabilties. Almost all of them supports the following: See ppl_distributions.pi (and ppl_distributions_test.pi) for details.

The supported probability distributions: Please note that some of the "wilder" distribution might be ... wild, and - in certain cases - give error due to too large values.

Most random generating functions has a variant for generating N samples. These are named with an _n. E.g. categorical_n([1/2,1/3,1/6],[a,b,c],100) generates 100 samples.

Example using binomial distribution:
% Generate a random sample
Picat> X=binomial_dist(10,0.49)  
X = 5

% Generate 10 random samples
Picat> X=binomial_dist_n(10,0.49,10)
X = [4,5,4,7,5,4,7,6,5,4]

% PDF
Picat> X=binomial_dist_pdf(10,0.49,8)            
X = 0.038897483585189

% CDF
Picat> X=binomial_dist_cdf(10,0.49,8)
X = 0.990897167987681

Picat> X=1-binomial_dist_cdf(10,0.49,8)
X = 0.009102832012319

% Quantile
Picat> X=binomial_dist_quantile(10,0.49,0.99)  
X = 8

% Mean and variance
Picat> X=binomial_dist_mean(10,0.49)         
X = 4.9

Picat> X=binomial_dist_variance(10,0.49)
X = 2.499
Picat PPL also includes some convenience functions for easier handling of these probability functions: The first argument to these function is a probability distribution (as a term), and must be '$' escaped, e.g. $binomial_dist(10,0.49).

Some examples:
% A single PDF:
Picat> X=pdf($binomial_dist(10,0.49),8)                                          
X = 0.038897483585189

% All PDFs
Picat> pdf_all($binomial_dist(10,0.49)).printf_list         
 1 0.011437409348143 
 2 0.04944997571109 
 3 0.126695362606191 
 4 0.213022104774134 
 5 0.245601956092531 
 6 0.196642089028334 
 7 0.107960362603791 
 8 0.038897483585189 

% In decreasing probability order
Picat> pdf_all($binomial_dist(10,0.49)).sort_down(2).printf_list
 5 0.245601956092531 
 4 0.213022104774134 
 6 0.196642089028334 
 3 0.126695362606191 
 7 0.107960362603791 
 2 0.04944997571109 
 8 0.038897483585189 
 1 0.011437409348143 

% The default quantiles for the shown values are 0.01 .. 0.99.
% This can be changed by adding parameters for the lower and upper quantiles
% to use.
Picat> pdf_all($binomial_dist(10,0.49),0.00001, 0.99999).sort_down(2).printf_list
 5 0.245601956092531 
 4 0.213022104774134 
 6 0.196642089028334 
 3 0.126695362606191 
 7 0.107960362603791 
 2 0.04944997571109 
 8 0.038897483585189 
 1 0.011437409348143 
 9 0.008304909349343 
 0 0.001190424238276 
10 0.000797922662976 

% CDF all
Picat> cdf_all($binomial_dist(10,0.49),0.00001, 0.99999).sort_down(2).printf_list
10 1.0 
 9 0.999202077337024 
 8 0.990897167987681 
 7 0.951999684402491 
 6 0.8440393217987 
 5 0.647397232770366 
 4 0.401795276677834 
 3 0.1887731719037 
 2 0.062077809297509 
 1 0.012627833586419 
 0 0.001190424238276 

% Quantiles
Picat> quantile_all($binomial_dist(10,0.49)).printf_list 
0.000001  0 
0.00001  0 
0.001  0 
0.01  1 
0.025  2 
0.05  2 
0.25  4 
0.5  5 
0.75  6 
0.84  6 
0.975  8 
0.99  8 
0.999  9 
0.99999 10 
0.999999 10 

Picat> X=meanf($binomial_dist(10,0.49))                 
X = 4.9

Picat> X=variancef($binomial_dist(10,0.49))
X = 2.499

Modeling in Picat PPL

One of the reason that Picat PPL was created was to support probabilistic programming ideas for solving probability puzzles, etc in Picat. It has a syntax that is inspired by higher level PPL systems such as Racket/Gamble and WebPPL.

Here is simple Picat PPL model: The Monty Hall problem (cf The Monty Hall problem). For more information about the supported model functions, see the examples below and ppl_distributions.pi and ppl_utils.pi.
% Import the PPL modules
import ppl_distributions, ppl_utils, ppl_common_utils.

go ?=>
  reset_store, % Clears everything for a new experiment
  NumRuns = 10_000, % 10 000 runs
  run_model(NumRuns,
            $model, % The model we run
            [show_probs_rat,mean] % The statistics to show
            ),
  % fail, % if there are more than one parameter to check
  nl.
go => true.

model() =>
  % WLOG: We always select door d1
  Doors = [d1,d2,d3],

  % Randomly select the prize door
  Prize = uniform_draw(Doors),

  % Which door does Monty open?
  MontyOpen = cond(Prize == d1,
                   uniform_draw([d2,d3]),  % Monty must pick D2 or D3 randomly
                   cond(Prize==d2,d3,d2)), % Monty must not open the prize door

  % We observe that Monty opens door 2   
  observe(MontyOpen == d2),

  % If there is an observation in the model,
  % then the observed_ok function is used to filter out
  % the rejected solutions.
  if observed_ok then
    add("prize",Prize) % Add the value of Prize to the store
  end.
Runs this as:
 $ picat ppl_monty_hall.pi
A sample output for the model
var : prize
Probabilities:
d3: 0.66331860184813174 (1651 / 2489)
d1: 0.33668139815186821 (838 / 2489)
mean = [d3 = 0.663319,d1 = 0.336681]
Note: Picat PPL uses rejection sampling for calculating the probabilities of the random variables in a model, so the values are not exact. Increasing the number of runs will get more precise result, but then it will take longer to run. I tend to use 10 000 runs for an experiment, but sometimes it might be enough with 1000 samples. And sometimes much more is needed, e.g. 100 000, or 1 000 000 runs to get a fairly reasonable result, especially for models with observations since these can reduce accepted solutions drastically.

The work horse of modeling is the run_model/2-3 function. Model is the name of the model to run, and there are two principal cases: Here are some of the most common random generating functions/utilities to be used in a Picat PPL model: All the probability distributions listed above can be used as random generators as well.

A note on observe and add

observe and add are two very important modeling functions in Picat PPL (especially observe differs in certain ways compared tp Gamble and WebPPL): And - of course - most of Picat's other features can be used in a Picat PPL model such as: If is often a matter of style to use recursive functions (the style of Gamble), Prolog style recursions, or foreach/while loops to implement a model. Personally I tend to play with different styles/approaches.

Warning: The use of Picat's nondeterministic features (such as member/2) in a model might not work as excepted and should be used with care: The Picat PPL engine assumes that the models are run deterministically except for the randomness of the supported random functions. However, using member/2 and fail/0 outside a model - such as before/after run_model/2-3 - is very useful for running several experiments with different parameters.

Options to run_model

For the output of a Picat PPL model there are several different options. They are listed in the third Option parameter of run_model(NumRuns,Model,Options). Here is a simple model with many of these options activated:
go =>
  reset_store,
  run_model(10_000,$model,[show_probs_trunc,truncate_size=6,
                           mean,variance,
                           show_percentiles,
                           show_hpd_intervals,hpd_intervals=[0.84,0.94],
                           show_histogram,
                           show_summary_stats]),
  nl.
model() =>
  X = poisson_dist(10),
  add("x",X).
A sample output running this model:
var : x
Probabilities (truncated):
9: 0.1291000000000000
10: 0.1265000000000000
8: 0.1115000000000000
11: 0.1092000000000000
12: 0.0974000000000000
7: 0.0880000000000000
.........
20: 0.0019000000000000
21: 0.0014000000000000
23: 0.0002000000000000
1: 0.0002000000000000
25: 0.0001000000000000
22: 0.0001000000000000
mean = 10.0202
variance = 9.97979
Percentiles:
(0.001 2)
(0.01 4)
(0.025 4)
(0.05 5)
(0.1 6)
(0.25 8)
(0.5 10)
(0.75 12)
(0.84 13)
(0.9 14)
(0.95 15)
(0.975 17)
(0.99 18)
(0.999 21)
(0.9999 23.000199999998586)
(0.99999 24.800020000002405)
HPD intervals:
HPD interval (0.84): 6.00000000000000..14.00000000000000
HPD interval (0.94): 4.00000000000000..15.00000000000000
Histogram (total 10000)
      1:     2                                                              (0.000 / 0.000)
      2:    20 #                                                            (0.002 / 0.002)
      3:    73 ###                                                          (0.007 / 0.009)
      4:   202 #########                                                    (0.020 / 0.030)
      5:   388 ##################                                           (0.039 / 0.069)
      6:   595 ############################                                 (0.059 / 0.128)
      7:   880 #########################################                    (0.088 / 0.216)
      8:  1115 ####################################################         (0.112 / 0.328)
      9:  1291 ############################################################ (0.129 / 0.457)
     10:  1265 ###########################################################  (0.127 / 0.583)
     11:  1092 ###################################################          (0.109 / 0.692)
     12:   974 #############################################                (0.097 / 0.790)
     13:   737 ##################################                           (0.074 / 0.863)
     14:   543 #########################                                    (0.054 / 0.918)
     15:   346 ################                                             (0.035 / 0.952)
     16:   201 #########                                                    (0.020 / 0.972)
     17:   128 ######                                                       (0.013 / 0.985)
     18:    67 ###                                                          (0.007 / 0.992)
     19:    44 ##                                                           (0.004 / 0.996)
     20:    19 #                                                            (0.002 / 0.998)
     21:    14 #                                                            (0.001 / 1.000)
     22:     1                                                              (0.000 / 1.000)
     23:     2                                                              (0.000 / 1.000)
     25:     1                                                              (0.000 / 1.000)
Summary statistics
Count:               10000
Min:                 1
Max:                 25
Sum:                 100202
Mean:                10.020200000000024
Variance (pop):      9.9797919599999876
Std dev (pop):       3.1590808726590063
Variance (sample):   9.9807900390038888
Std dev (sample):    3.1592388385501797
Skewness (unbiased): 0.0031929254256123467
Kurtosis excess:     -3.0005891334657218
Median:              10
Q1 (25%):            8
Q3 (75%):            12

"observe as a constraint" / "reversibility"

I put these terms in quotes since I don't want to draw the similarity with probabilistic programming and logic programming/constraint modeling too far. But, there are some similarities between these two paradigms, which is are some of the reasons that I like them both.

Let's take "observe as a constraint" first. The role of a constraint in constraint programming is to reduce the domains of some decision variables. Similarly, in probabilistic programming, an observation is a way to force the system to set a random variable to a certain value, or a range of certain values.

"reversibility" is a concept from logic programming which means that one can use the variables in predicate in several way, i.e. there is no direction of the the variables: they can be input variables as well as output variables. One of the best examples of this reversibility in Picat (and Prolog) is the non-deterministic member(Element,List)predicate:
In probabilistic program there is a similar idea that there is no fixed direction of the probabilstic function. For example in X = binomial_dist(N,P), all these three variables can be fixed or as random variables themselves. (Note: for continuous probability distributions, e.g. X = normal_dist(Mu,Sigma) we cannot simply state this with =; instead one have to use some acceptable interval which defines how "similar" X is, for example abs(X-normal_dist(Mu,Sigma))<=0.001).

A simple example of both these features (observation as constraints and reversibility) is this simple problem: How to identify a person's sex by their height. This is slighly altered model from See ppl_gender_height.pi.
model() =>
  ObservedHeight = 180, % The observed value, 180 cm.

  % Gender is even distributed 
  Gender = uniform_draw(["Male","Female"]),

  % Define the height from the gender
  % From https://en.wikipedia.org/wiki/List_of_average_human_height_worldwide
  % Here are the values for Sweden.
  Height = cond(Gender == "Male",
                normal_dist(181.5,sqrt(50)), % in cm
                normal_dist(166.8,sqrt(50))),

  % The constraint: Observe a height (interval +/- 1.0)
  observe(abs(Height-ObservedHeight) <= 1.0),

  % If the obervation ("constraint") is successful (satisfied), add it
  % to the store
  if observed_ok then
    add("gender",Gender),
    % add("height",Height),
  end.
The output of this model:
observedHeight: 180 (cm)

var : gender
Probabilities:
Male: 0.8294314381270903
Female: 0.1705685618729097
Which means that there is a probability of about 83% that this person is a male.

We could also use the same model - with some simple changes - to observe that this is a female (and then male), and then get the ranges of probable heights for the specific gender. This is model2 in the same program (in go2/0).
model2(ObservedGender) =>
  Gender = uniform_draw(["Male","Female"]),
  Height = cond(Gender == "Male",
                normal_dist(181.5,sqrt(50)), % in cm
                normal_dist(166.8,sqrt(50))),

  observe(Gender == ObservedGender),
  if observed_ok then
    add("gender",Gender),
    add("height",Height)
  end.
The output is:
observedGender: Male
var : gender
Probabilities:
Male: 1.0000000000000000
mean = [Male = 1.0]

var : height
Probabilities (truncated):
208.085784765518241: 0.0001964250638381
204.926380749458076: 0.0001964250638381
204.779039077674639: 0.0001964250638381
204.243852168352419: 0.0001964250638381
.........
158.284888115629229: 0.0001964250638381
157.440393568938504: 0.0001964250638381
155.607331346145116: 0.0001964250638381
153.755536221383153: 0.0001964250638381
mean = 181.399

observedGender: Female
var : gender
Probabilities:
Female: 1.0000000000000000
mean = [Female = 1.0]

var : height
Probabilities (truncated):
191.013584387093658: 0.0001980590215884
187.854280695754255: 0.0001980590215884
187.74447212062779: 0.0001980590215884
187.658912843387753: 0.0001980590215884
.........
143.750190582955781: 0.0001980590215884
143.295663612388978: 0.0001980590215884
142.903471658505879: 0.0001980590215884
141.829663018950583: 0.0001980590215884
mean = 166.841
Here are some other fun height related problems:
To conclude, there are quite a few similarities between constraint programming (CP) and probabilistic programming (PP):

Using Picat PPL for Bayesian Data Analysis / Recovering distribution parameters

One case - and in certain research areas the most important case - of a probabilistic programming system is to do Bayesian Data Analysis (BDA) on a dataset, for example to recover parameters for a probability distribution in a model given a dataset to observe (using observe). As mentioned above, Picat PPL is in general especially not good at this. (I should mention that I'm not very much into BDA, and use it just for recreational/educational purposes.)

The reason for this is that the Picat PPL mechamism for the observe inference, is simply too simple to handle large data set with its rejection sampling method. However, there are some tricks that might give some results, at least for educational/educational purposes.

One thing that might work - and can actually be quite fast - is to use the ABC (Approximate Bayesian Computation) inspired observation method observe_abc(Data,Sample that restricts the difference of the mean and standard deviation (or some other statistical measure) of the data and the generated sample. Here's an example using this (from ppl_gumbel_recover.pi). Things to notice:
go ?=>
  Data = gumbel_dist_n(12,3,100),
  println(data=Data),
  println([data_len=Data.len,mean=Data.mean,variance=Data.variance,stdev=Data.stdev]),
  reset_store,
  run_model(10_000,$model(Data),[show_probs_trunc,mean,
                                  show_hpd_intervals,hpd_intervals=[0.84],
                                  min_accepted_samples=1000,show_accepted_samples=true
                                  ]),
  nl,
  nl.
go => true.
   
model(Data) =>
  Len = Data.len,
  Mean = Data.mean,
  Stdev = Data.stdev,
  Variance = Data.variance,
  
  A = uniform(1,20),
  B = uniform(1,20),

  % A = normal_dist(Mean,4),
  % B = normal_dist(Mean,4),  

  Y = gumbel_dist_n(A,B,Len),
  YMean = Y.mean,
  YStdev = Y.stdev,

  % observe( abs(Mean-YMean) < Stdev),
  %observe( abs(Stdev-YStdev) < Stdev),  

  % Using the convenience function
  observe_abc(Data,Y) 

  Post = gumbel_dist(A,B),

  if observed_ok then
    add("a",A),
    add("b",B),
    add("post",Post)
  end.
Output of the means from a sample run:
a: 12.686
b: 3.28662
post: 10.8731
This took about 4s on my machine.

The option min_accepted_samples=MinSamples (see above) might be helpful for forcing the minimum number of accepted samples (instead of playing with different values of the number of runs). This will not yield a faster solution, but might save some time when testing the model, though my changing its parameter - such as the scale or the statistical measure methods - this can be faster.

Here are some more models using this technique for BDA/parameter recover problems: Note: For discrete models, one might think of two things: Here are some discrete models using this "ABC inspired" method: If that does not work, there might be some other things that help. For models large domain, one can widen the acceptable interval and see how it works. For example, from observe(normal_dist(Mu,Sigma)-Data[I] <= 0.1, change it to observe(normal_dist(Mu,Sigma)-Data[I] <= 1.

If possible/applicable, sorting the dataset (as well and the generated samples) might also give some better results.

Another - and perhaps the last - tip is to - drastically - reduce the data set, to - say - the first 3 or 4 data points. And by selecting these data points carefully this might even gve some fairly good results, such as just the smallest, mid and largest values in tha data set.

For more advanced use of BDA, it is definitely recommended to use systems such as PyMC, Stan, Turing. jl (see my Turing.jl page), etc. Also, Gamble and WebPPL are often better than Picat PPL for these BDA tasks as well, even if they are not really designed for large datasets. See my Gamble and WebPPL pages for some examples.

Picat PPL models/files

Here are some Picat PPL model. Most of them are ported from my Gamble models.

These Picat PPL files are available in the zip file all_public.zip.