# Review of key concepts¶

Before preceding, we will make clear definitions of what information is included in this app. We cut some corners in our definitions here, but these definitions are functional for applied purposes.

A **probability distribution** specifies the probabilities of all of the possible outcomes in a sample space. For example, if the sample space is nonnegative integers, the probability distribution describes the probability of observing each nonnegative integer.

## PMFs and PDFs¶

Probability distributions may be specified in various ways. For distributions describing discrete random variables, that is random variables that may take on only discrete values, such as integers, the **probability mass function** (PMF) is a useful specification of the distribution. In the discrete case, we define \(f(x)\) to be the probability of obtaining discrete value \(x\). The **normalization condition** of the PMF is

where the sum is over all possible values of \(x\) in the sample space.

For continuous random variables, the probability of a given particular real number is zero. Instead, we can define a function called a **probability density function** such that the probability that a value of continuous variable \(x\) is between \(a\) and \(b\) with \(a<b\) is

## CDFs¶

The **cumulative distribution function** (CDF), denoted \(F(x)\), is defined such that \(F(x)\) is the probability that the value of a variable \(x\) is less than or equal to \(x\). For a distribution describing a discrete random variable, the CDF is related to the PMF as

where \(k_\mathrm{min}\) is the minimal value the variable can take. For a continuous distribution, the CDF is related to the PDF by

or

## Methods of plotting PDFs, PMFs and CDFs¶

To help in interpreting plots of the univariate PDFs, PMFs, and CDFs in this app, I show how each are plotted.

### Plots of PDFs¶

We plot a PDF as a smooth curve. The curve only appears for values of the x-axis for which the distribution is supported; that is on the domain of the distribution. As an example, below is a plot of a PDF for the Gamma distribution, which is supported on the set of positive real numbers.

### Plots of PMFs¶

Since they take on discrete values, we plot PMFs differently. The convention in this app is that each nonzero probability is shown by a point with a line connecting to the x-axis. As an example, below is a plot of a Binomial distribution PMF.

### Plots of CDFs for continuous distributions¶

Plots of CDFs for continuous distributions are plotted as smooth curves. Taking the example of the Gamma distribution again, the CDF is plotted as below.

### Plots of CDFs for discrete distributions¶

For discrete distributions, I plot the CDFs as “staircases,” as shown below.

The CDF appears to be multivalued at the vertical lines of the staircase. It is not. Furthermore, the lines at zero and one on the CDF axis should extend out to \(-\infty\) and \(\infty\), respectively along the horizontal axis. Strictly speaking, the CDF should be plotted as follows.

However, since it is understood that the CDF is not multivalued, there should be no ambiguity in plotting the staircase, and indeed staircase style CDFs are commonly used. The staircase has less clutter and I find it is easier to look at and interpret. Furthemore, we know that all CDFs extend toward \(x=-\infty\) with a value of zero and toward \(x=\infty\) with a value of one. So, again, there is no ambiguity in cutting off the infinitely long tails of the CDF.

## Parametrization¶

If a probability mass or density function depends on parameters, say \(N\) and \(\theta\), we write it as \(f(x;N,\theta)\). There does not seem to be consensus on the best notation for this, and you may see this same quantity written as \(f(x\mid N, \theta)\), implying conditioning, for example.

Distributions may be parametrized in different ways. For example, we may parametrize a Normal distribution in terms of what is commonly called the standard deviation \(\sigma\), but we can also parametrize it by the precision \(\tau \equiv 1/\sigma\). The parametrizations I use in this app are those used in the Stan probabilistic programming language.

## Location and scale parameters¶

Some parameters of distributions have special properties. A **location** parameter shifts the PMF/PDF and CDF along the x-axis. A **scale** parameter serves to rescale the x-axis. As an example, the Normal distribution has PDF

where I have omitted the normalization constant for clarity in the present discussion. The PDF reaches a maximum at \(x=\mu\). The parameter \(\mu\) is a location parameter because I could define \(x' = x-\mu\) and still get a Normal PDF in \(x'\) with a maximum at \(x' = 0\).

The parameter \(\sigma\) is a scale parameter because I could define \(x' = x/\sigma\) and \(\mu' = \mu / \sigma\), and I get a new PDF,

This is as if I stretched the x-axis by a factor of \(\sigma\).

In this app, I will refer to :math:’mu’ not by its common name of “the mean,” but instead as “the location parameter.” This is because the word “mean” can have different meanings in different contexts, and using the term “location parameter” is unambiguous. Similarly, I will refer to \(\sigma\) as the scale parameter and not the standard deviation. This is also consistent with the nomenclature in NumPy and SciPy.

## Moments¶

A **moment** of a distribution can be defined in terms of its probability density function or probability mass function. Before defining moments, it is best to first define the **expectation** of a function **g(x)** for a given distribution. For a continuous distribution with PDF \(f(x)\), this is

For a discrete distribution with PMF \(f(x)\), the expectation of \(g(x)\) is

The nth moment of a distribution is \(E(x^n)\). The first moment of a distribution is called the **mean**, and here we will denote it at \(\mu\). We define the \(n\).

Perhaps the two most important moments of a distribution are the first moment (the mean) and the second central moment, \(E((x-\mu)^n)\), which is called the **variance**. For each distribution I display its mean and variance, if they exist.

## Useful data generation concepts¶

In describing stories of distributions, the concepts of a **Bernoulli trial** and of a **Poisson process** are useful.

### Bernoulli trial¶

A Bernoulli trial is an experiment that has two outcomes that can be encoded as success (\(y=1\)) or failure (\(y = 0\)). The words “success” and “failure” do not necessarily mean positive or negative outcomes as they appeal to human emotion. They are just names for the encodings of the outcomes.

### Poisson process¶

Rare events occur with a rate \(\lambda\) per unit time. There is no “memory” of previous events; i.e., that rate is independent of time. A process that generates such events is called a Poisson process. The occurrence of a rare event in this context is referred to as an *arrival*.

## Multivariate distributions¶

So far, we have assumed **univariate distributions**, that is probability distributions of a single random variable. We may also consider **multivariate distributions**, which describe more than one random variable. For a distribution of \(n\) random variables, we define the PMF or PDF as \(f(x_1, x_2, \ldots, x_n)\). For ease of discussion, we can consider the bivariate case describing random variables \(X\) and \(Y\) which may take on values \(x\) and \(y\). In that case, the PMF or PDF is written as \(f(x, y)\). It is permissible that, e.g., \(x\) is continuous and \(y\) is discrete. The multivariate cumulative distribution function is given by \(F(x, y) = P(X \le x, Y \le y)\).