Find probabilities using discrete and continuous probability distributions
What is Probability Distributions?
- Probability distributions represent the probabilities associated with all outcomes of a random variable.
- Depending on the type of random variable - discrete or continuous - probability distributions classified as discrete and continuous probability distributions.
Discrete probability distribution
- Discrete probability distributions explain the probabilities associated with each possible outcome of a discrete random variable (countable quantity such as 0, 1, 2, and so on and not fractions, e.g. number of apples).
- The probability of each observation of discrete random variable lies between 0 and 1, and the sum of probabilities of all observations is 1.
- Binomial and Poisson distributions are a discrete probability distribution
- For example, a restaurant sells 10 to 20 pizzas during lunch hour, and Table 1 represents the discrete probability distribution of pizza sell. A random variable (X) takes all possible discrete values between 10 and 20. p(X=x) or p(x) represents the probability of each value of pizza sell.
x | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|
p(x) | 0.07 | 0.09 | 0.11 | 0.12 | 0.16 | 0.14 | 0.10 | 0.09 | 0.06 | 0.03 | 0.03 |
Table 1: Probability distribution of pizza sells
Graphically, it can be shown as,
Figure 1: Probability distribution of pizza sells
Probability mass function (PMF) and cumulative distribution function (CDF)
- The probability mass function (PMF) is a distribution of the probability of each possible value (x) of X. For example, p(X=12) is 0.11, which is the PMF of X evaluated at 12.
- Similar to PMF, the cumulative distribution function (CDF) is a cumulative probability of at most x’s values of X. For example, p(X<=12) is 0.27, which is a cumulative probability of p(X=10), p(X=11), and p(X=12).
Continuous probability distribution
- Continuous probability distributions explain the probabilities associated with each possible outcome of a continuous random variable (infinite and uncountable quantity such as any values in a specified range, e.g. time spent on reading a blog page).
- The probability of each observation of continuous random variable that lies in between two values (a and b) is the area under the curve between a and b (see shaded area in Figure 2).
- For a continuous random variable, a probability density function (PDF) is used for calculating the probability for an interval between the two values (a and b) of X. The probability p(a ≤ x ≤ b) of any value between the a and b is equal to the area under the curve of a and b. The total area under the curve is always equal to one.
- Generally, the probability of interval is calculated in continuous probability distributions because the probability that X takes any single value is always zero.
- Similar to PDF, cumulative distribution function (CDF) is used for calculating the probability for all values of X which are less than or equal to some value p(X ≤ x ).
- The normal distribution, exponential distribution, and uniform distribution are continuous probability distributions
Let's take an example, a daily time spent on reading a blog page is approximately normally distributed with a mean of 3
minutes and a standard deviation of 0.5.
The shaded area in Figure 2 represents the probability that the time spent on reading a blog page in
between 3 to 4 minutes i.e. p(3 ≤ x ≤ 4).
Figure 2: Normal distribution time spent on reading a blog page
References
- https://tinyurl.com/yydhju4g
- https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html
- https://amsi.org.au/ESA_Senior_Years/PDF/ContProbDist4e.pdf
This work is licensed under a Creative Commons Attribution 4.0 International License