A Category Theory Perspective: The Probability Monad 🔮#
For those who like category theory, we can frame the concepts of pure and bind for probability mass functions in the language of monads.
Indeed, the structure formed by PMF, pure, and bind is a classic example of a monad.
In this view, PMF is a type constructor that wraps a type α in a "probabilistic context," yielding PMF α.
The functions pure and bind are the two fundamental operations that define the monad.
pure as Monadic unit#
-
pureis the standard monadicunit(akareturn) -
It takes
a : αand lifts it into the monadic context with minimal effect.pure: α →PMFα -
In the probability monad "minimal effect" means "no uncertainty."
-
pure alives in the probabilistic world---it is a Dirac delta distribution \(δ_a\), where all the probability mass is concentrated on that single value. -
pureis a lens through which to view anya : αas a (constant) "random" variable. -
pureis a thunk; i.e., a way to viewaas the (constant) functionλ _ → a.
bind as Monadic >>=#
The bind function is the cornerstone of the monad.
It is often written >>= and sometimes called flatMap (e.g. in Scala).
It defines how to compose operations within the monadic context.
bind : PMF α → (α → PMF β) → PMF β
This signature is revealing: bind takes
- a probability distribution
p : PMF α, - a "Kleisli arrow" which maps each
a : αto a distribution onβ,
and returns a new distribution over β, which represents the total probability of the combined, sequential process.
In our context, a Kleisli arrow is a family of probability distributions
over β, indexed by α.
Monads and effects
The composition that bind manifests--combining a value with a Kleisli arrow to
produce a distribution--is precisely how monads are able to handle side effects
in computational contexts--in this case, the context is probability and the
side effects are correlations or dependencies among random processes, that is,
the effects that values of random variables, or "events," can have on the
probabilities of other events.
The Monad Laws for Probability#
For (PMF, bind, pure) to comprise a monad, it must satisfy three laws, which guarantee that sequencing probabilistic computations behaves sensibly.
-
Left Identity:
bind (pure a) f ≡ f a- In probability terms: given the "point mass at
a" distribution overα, applying the familyfsimply gives the distributionf a.
- In probability terms: given the "point mass at
-
Right Identity:
bind p pure ≡ p- In probability terms: If you have a random variable with distribution
pand your second step is to simply take its outcome and deterministically wrap it back into aPMF, you haven't actually changed the distribution. The final distribution is identical to the originalp.
- In probability terms: If you have a random variable with distribution
-
Associativity:
bind (bind p f) g ≡ bind p (λx => bind (f x) g)- In probability terms: Thhe way in which you group the composition of a given sequence of probabilistic events doesn't matter. You can either run the first two events (
bind p f) and then pipe the result into the third (g), or you can compose the second and third events (λx => bind (f x) g)) and pipe the result of the first event (p) into that composite function. Both methods yield the same final probability distribution. This is analogous to the law of total probability applied iteratively.
- In probability terms: Thhe way in which you group the composition of a given sequence of probabilistic events doesn't matter. You can either run the first two events (
-
By satisfying these laws, the triple (
PMF,bind,pure) provides a robust compositional framework for building complex probabilistic models from simpler ones. -
The expression
bind μMK (λ (m, k) => pure (encrypt m k))is a clear demonstration of this principle, sequencing a random draw fromμMKwith a deterministic computation.
The Category of Types#
The single category we are working in is the category of types and functions (often called Type in programming languages like Lean, or Set in set theory).
- Objects: The objects are the types themselves (e.g.,
Bool,Nat,String). - Morphisms: The morphisms are the functions between types (e.g.,
is_even: Nat → Bool).
How PMF acts on the category of types#
-
Mapping Objects.
The
PMFfunctor takes an object (a type α) and maps it to a new object in the same category, the typePMFα. For example,- the object
Boolis mapped to the objectPMF Bool. - the object
Natis mapped to the objectPMF Nat.
Crucially,
PMF Boolis still just a type---an object in the category Type. - the object
-
Mapping Morphisms.
The
PMFfunctor takes a morphism,f: α → β, and maps it to a new morphism in the same category (the functionmap f: PMF α → PMF β).- It maps the function
not: Bool → Boolto the new functionmap not: PMF Bool → PMF Bool. - It maps the function
is_even: Nat → Boolto the new functionmap is_even: PMF Nat → PMF Bool.
- It maps the function
Because PMF takes any object or morphism in Type and produces another object or morphism right back in Type, it is an endofunctor on Type.
PMF is a functor#
Recall, a functor, F : 𝒞 → 𝒟, is a map from one category to another that
satisfies the functor laws.
𝒞 𝒟
α ----> Fα
| | fmap : (α → β) → (Fα → Fβ)
| |
f | | fmap f Functor Laws
| | i. fmap id = id
v v ii. fmap (g ∘ f) = (fmap g) ∘ (fmap f)
β ----> Fβ
What are the categories involved here?
Clearly the domain category of PMF is the category of types.
Also, PMF is a type constructor in the sense that, for each type α, the value
PMF α is a type.
Thus the codomain category of PMF is also Type.
From Monad to Functor#
For any monad, the monadic structure automatically gives rise to a functor.
The functor's fmap operation can be defined using bind and pure.
A type constructor is a functor if it supports an fmap function that lifts a regular function into the context of the type.
For PMF, this means we need a function with the following signature:
fmap : (α → β) → (PMF α → PMF β)
But this can be defined directly from the monadic operations, bind and pure!
fmap f p = bind p (λ x ⇒ pure (f x))
This composition has a clear probabilistic meaning:
bind p ...: sample a valuexfrom the distributionp.... λ x => pure (f x): take the sampled valuex, apply the functionf, and produce a deterministic distribution that always returns the resultf x.
The overall effect is a new probability distribution over the type β, which is precisely what mapping a function over p should produce.
The Functor Laws#
This definition of fmap also satisfies the two functor laws, which follow directly from the monad laws:
-
Identity:
fmap id = id- Mapping the identity function over a distribution doesn't change it.
-
Composition:
fmap (g ∘ f) = (fmap g) ∘ (fmap f)- Mapping the composition of two functions is the same as mapping the first function and then mapping the second.
-
So, while
bindis about sequencing probabilistic computations,fmapis about applying a deterministic transformation to the outcome of a probabilistic computation. -
The fact that every monad (like
PMF) is also a functor is a fundamental concept in category theory that provides this usefulfmapoperation for "free".
Application#
Recall how we defined the ciphertext distribution,
μC = bind μMK (λ (m, k) => pure (encrypt m k)).
μMKis the joint distribution of (message, key) pairs.bind μMKsays "sample from this distribution."λ (m, k) => pure (encrypt m k)says "apply encryption deterministically."