Master Probability in Data Science

16 min readJun 28, 2023

Unlock the power of data science with probability. In this blog, we explore how probability drives statistical inference, predictive modeling, machine learning, and uncertainty quantification. Join us as we master the essential role of probability in data science.

Before understanding Probability we need to understand some important terms of Probability which are the pillars of Probability. Here are a few key terms:

Random Experiment

An experiment is called a random experiment if it satisfies the following two conditions:
(i) It has more than one possible outcome.
(ii) It is not possible to predict the outcome in advance

Example: Tossing a coin, Rolling a Dice

Trial

Trial refers to a single execution of a random experiment. Each trial produces an outcome.

Outcome

Outcomes are the possible results or observations that can occur in a random experiment. For example, when flipping a fair coin, the possible outcomes are “heads” and “tails.”

Sample Space

Sample Space: The sample space is the set of all possible outcomes of an experiment. For example, when rolling a fair die, the sample space would be {1, 2, 3, 4, 5, 6}.

Event

An event is a specific set of outcomes from a random experiment or process. Essentially, it’s a subset of the sample space. An event can include a single outcome, or it can include multiple outcomes. One random experiment can have multiple events. It can be something like “rolling a 6 on a fair die” or “drawing a red card from a deck.”

Type of Event

Here are some common types of events:

Simple Event: A simple event is a single, elementary outcome of an experiment. For example, when rolling a fair die, the event of obtaining a specific number, such as rolling a 3, is a simple event.

Compound Event: A compound event is a combination of two or more simple events. It involves the occurrence of multiple outcomes simultaneously or sequentially. For example, rolling an even number and getting a head when flipping a coin are compound events.

Mutually Exclusive Events: Mutually exclusive events are events that cannot occur simultaneously. If one event happens, the other(s) cannot occur. For instance, in rolling a die, the events “getting an odd number” and “getting an even number” are mutually exclusive.

Independent Events: Independent events are events where the occurrence or non-occurrence of one event does not affect the probability of the other event happening. For example, flipping a coin twice, the outcome of the first flip does not impact the outcome of the second flip.

Complementary Events: Complementary events are two events that together encompass all possible outcomes of an experiment. The occurrence of one event implies the non-occurrence of the other event. For instance, in flipping a coin, the events “getting a head” and “getting a tail” are complementary.

Impossible Event: An impossible event has a probability of 0, meaning it cannot occur under any circumstances. For example, when rolling a fair die, the event of rolling a 7 is impossible.

Certain Event: A certain event has a probability of 1, indicating that it will definitely occur. For instance, if you roll a fair die, the event of getting a number between 1 and 6 is certain.

Exhaustive Events: A set of events is exhaustive if at least one of the events must occur when the experiment is performed. For example, when rolling a die, the events “roll an even number” and “roll an odd number” are exhaustive because one or the other must occur on any roll.

These are just a few basic terms in probability, but understanding them provides a solid foundation for exploring more advanced concepts and techniques in probability theory and data science.

What is Probability?

In simplest terms, probability is a measure of the likelihood that a particular event will occur. It is a fundamental concept in statistics and is used to make predictions and informed decisions in a wide range of disciplines, including science, engineering, medicine, economics, and social sciences.

Probability is usually expressed as a number between 0 and 1, inclusive:
• A probability of 0 means that an event will not happen.
• A probability of 1 means that an event will certainly happen.
• A probability of 0.5 means that an event will happen half the time (or that it is as likely to happen as not to happen).

Empirical Probability Vs Theoretical Probability

Empirical probability, also known as experimental probability, is a probability measure that is based on observed data, rather than theoretical assumptions. It’s calculated as the ratio of the number of times a particular event occurs to the total number of trials.

A. Suppose that, in our 100 tosses, we get heads 55 times and tails 45 times. What is the empirical probability of getting a head?

P(H) = 55/100

B. Let’s say you have a bag with 50 marbles. Out of these 50 marbles, 20 are red, 15 are blue, and 15 are green. You start to draw marbles one at a time, replacing the marble back into the bag after each draw. After 200 draws, you find that you’ve drawn a red marble 80 times, a blue marble 70 times, and a green marble 50 times. What is the empirical probability of getting a red marble?

P(R) = 80/200

Theoretical (or classical) probability is used when each outcome in a sample space is equally likely to occur. If we denote an event of interest as Event A, we calculate the theoretical probability of that event as:

Theoretical Probability of Event A = Number of Favourable Outcomes (that is, outcomes in Event A) / Total Number of Outcomes in the Sample Space

A. Consider a scenario of tossing a fair coin 3 times. Find the probability of getting exactly 2 heads. Sample Space = {HHH, HTH, HHT, HTT, TTT, THT, TTH, THH}

P(Exactly 2 heads) = 4/8 = 1/2

B. Consider a scenario of rolling 2 dice. What is the probability of getting a sum = 7.

P(Getting a sum = 7) = 6/36 = 1/6

Random Variable

In the context of probability theory, a random variable is a function that maps the outcomes of a random process (known as the sample space) to a set of real numbers.
Input: The input to the function is an outcome from the sample space of a random process.
Output: The output of the function is a real number that we assign to each possible outcome.

For example, consider a random variable X that represents the outcome of rolling a fair six-sided die. X can take on values 1, 2, 3, 4, 5, or 6. The random variable X assigns a numerical value to each outcome, allowing us to study probabilities associated with each value.

Random variables are fundamental in probability theory and play a significant role in statistical analysis, modeling, and inference. They provide a way to quantify and analyze the uncertainties and patterns in data generated by random processes.

Random variables can be classified into two main types: discrete random variables and continuous random variables.

Discrete Random Variable: A discrete random variable is one that can take on a countable number of distinct values. These values are typically integers or a finite set of values. The probability distribution of a discrete random variable is described using a probability mass function (PMF), which assigns probabilities to each possible value. Examples of discrete random variables include the number of heads obtained when flipping a coin multiple times, the number of customers arriving at a store within a given time interval, or the outcome of rolling a fair die.

Continuous Random Variable: A continuous random variable is one that can take on any value within a specified range or interval. The probability distribution of a continuous random variable is described using a probability density function (PDF). Unlike discrete random variables, the probability of obtaining a specific value for a continuous random variable is typically zero. Instead, probabilities are associated with intervals. Examples of continuous random variables include the height of individuals, the time it takes to complete a task, or the temperature readings in a given region.

It is important to note that some random variables can have a mix of discrete and continuous components, known as mixed random variables. These random variables have both a discrete part, with specific values, and a continuous part, covering a range of values.

Probability Distribution of a Discrete Random Variable

The probability distribution of a discrete random variable describes the probabilities associated with each possible value that the random variable can take on. It is often represented using a probability mass function (PMF), which assigns probabilities to each value.

The PMF provides the probability of observing a specific value of the random variable. It is typically denoted by P(X = x), where X represents the random variable and x represents a particular value. The PMF satisfies two properties:

Non-negativity: The probabilities assigned by the PMF are non-negative values.

Summation: The sum of probabilities for all possible values of the random variable is equal to 1.

For Example: Rolling 2 Dice

Mean of a Random Variable

The mean of a random variable, often called the expected value, is essentially the average outcome of a random process that is repeated many times. More technically, it’s a weighted average of the possible outcomes of the random variable, where each outcome is weighted by its probability of occurrence.

Mathematically, it can be expressed as: E(X) = ∑ (x * P(X = x))

Here, x represents each possible value of the random variable, and P(X = x) represents the probability of that value occurring.

For a continuous random variable, the mean is calculated using the probability density function (PDF) and integration. It is represented as:

E(X) = ∫ (x * f(x)) dx

Here, f(x) represents the probability density function.

Variance of a random variable

The variance of a random variable is a statistical measurement that describes how much individual observations in a group differ from the mean (expected value).

For a discrete random variable, the variance is often denoted by Var(X) or σ², where X represents the random variable. The variance is calculated as the weighted average of the squared differences between each possible value of the random variable and the mean, where the weights are the probabilities associated with each value. Mathematically, it can be expressed as:

Var(X) = ∑ [(x — E(X))² * P(X = x)]

Here, x represents each possible value of the random variable, E(X) represents the mean of the random variable, and P(X = x) represents the probability of that value occurring.

For a continuous random variable, the variance is calculated using the probability density function (PDF) and integration. It is represented as:

Var(X) = ∫ [(x — E(X))² * f(x)] dx

Here, f(x) represents the probability density function

Till now we have cover the basics of the Probability, from now onwards we will understand some complex topics of the Probability.

Venn Diagrams in Probability

Venn diagrams are often used in probability to visually represent relationships and intersections between different events or sets. They provide a graphical way to illustrate the probabilities and relationships between various events within a sample space.

To do this we need to have a completed Venn diagram to be able to calculate probabilities.

E.g. Below is a Venn diagram describing the sets of odd numbers and prime numbers for the integer values in the universal set
ξ=1,2,3,4,5,6,7,8,9,10.

If we wanted to calculate the probability of a prime number, given that the number is odd, we need to know the frequency of numbers that are prime and odd (3), out of the total number of odd numbers (5). The probability is therefore 3/5.

Contingency Tables in Probability

Contingency tables, also known as cross-tabulation tables or two-way tables, are useful tool in probability and statistics for organizing and analyzing categorical data. They provide a way to examine the relationship between two or more categorical variables and understand the distribution of data across different categories.

In probability, contingency tables are commonly used to explore the joint probabilities of two or more events or variables. Let’s consider a simple example to illustrate this concept. Suppose we want to study the relationship between gender (male or female) and handedness (left-handed or right-handed) in a population of individuals. We can create a contingency table to summarize the data, as shown below:

In this table, the values a, b, c, and d represent the frequencies or counts of individuals falling into each combination of categories. For example, a represents the number of males who are left-handed, b represents the number of males who are right-handed, c represents the number of females who are left-handed, and d represents the number of females who are right-handed.

Contingency tables allow us to examine the distribution of the data and calculate various probabilities. We can compute the marginal probabilities, which are the probabilities of each category independently. In this example, the marginal probabilities would be the probabilities of being left-handed or right-handed, regardless of gender, and the probabilities of being male or female, regardless of handedness.

We can also calculate the conditional probabilities, which are the probabilities of one category given another. For instance, we could calculate the probability of being left-handed given that an individual is male or the probability of being female given that an individual is right-handed.

Contingency tables are often used in hypothesis testing to determine whether there is a significant association between the variables. Statistical tests such as the chi-square test can be applied to assess the independence of the variables and determine if the observed frequencies significantly deviate from what would be expected under independence.

Overall, contingency tables provide a visual representation of categorical data and allow us to analyze the relationship between variables in terms of probabilities and frequencies.

Type of Probability

Joint Probability

Joint probability refers to the probability of two or more events occurring simultaneously or the probability of multiple variables taking on specific values at the same time. It represents the likelihood of the intersection or overlap between the events or variables.

Let’s consider two events A and B. The joint probability of A and B is denoted as P(A and B) or P(A ∩ B), where the symbol “∩” represents the intersection of A and B. Mathematically, the joint probability is calculated as P(A and B) = P(A ∩ B)

Let’s say we have two random variables X and Y. The joint probability of X and Y, denoted as P(X = x, Y = y), is the probability that X takes the value x and Y takes the value y at the same time.

Let X be a random variable associated with the Pclass of a passenger
Let Y be a random variable associated with the survival status of a passenger.

import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

pd.crosstab(df['Survived'], df['Pclass'],normalize='all')

Marginal Probability

Marginal probability refers to the probability of a single event or variable occurring independently of other events or variables. It focuses on the probabilities of individual categories or outcomes without considering the joint occurrence of multiple events.

In the context of a contingency table or cross-tabulation, marginal probabilities are calculated by summing the probabilities or frequencies across rows or columns. They provide information about the distribution of one variable while ignoring the influence of other variables.

Let X be a random variable associated with the Pclass of a passenger
Let Y be a random variable associated with the survival status of a passenger.

import pandas as pd
import numpy as np

df = pd.read_csv('https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

pd.crosstab(df['Survived'], df['Pclass'],normalize='all', margins=True)

Conditional Probability

Conditional probability refers to the probability of an event occurring given that another event has already occurred or is known to have occurred. It measures the likelihood of an event happening under a specific condition or context.

Conditional probability is denoted as P(A|B), read as “the probability of A given B.” Here, A and B represent two events or outcomes.

Mathematically, the conditional probability is calculated as:

P(A|B) = P(A and B) / P(B)

The numerator, P(A and B), represents the joint probability of events A and B occurring together. The denominator, P(B), represents the probability of event B occurring.

For example, let’s consider drawing cards from a standard deck. If event A represents drawing a red card and event B represents drawing an ace, the conditional probability of drawing a red card given that an ace has been drawn is:

P(Red card|Ace) = P(Red card and Ace) / P(Ace)

Here, P(Red card|Ace) represents the probability of drawing a red card given that an ace has been drawn, P(Red card and Ace) represents the probability of drawing a red ace, and P(Ace) represents the probability of drawing an ace.

# Importing required libraries
import pandas as pd

# Creating a sample dataset
data = {
    'Hours Studied': [2, 4, 3, 5, 1, 6, 2, 4, 5, 3],
    'Passed Exam': ['No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No']
}

df = pd.DataFrame(data)

# Calculate conditional probability
hours_studied = 4
passed_exam = 'Yes'

# Count the number of students who studied for 'hours_studied'
students_with_hours = df[df['Hours Studied'] == hours_studied]

# Count the number of students who both studied for 'hours_studied' and passed the exam
students_passed_exam = students_with_hours[students_with_hours['Passed Exam'] == passed_exam]

# Calculate the conditional probability
conditional_prob = len(students_passed_exam) / len(students_with_hours)

# Print the result
print(f"The conditional probability of passing the exam given studying {hours_studied} hours is: {conditional_prob:.2f}")

Independent Vs Mutually Exclusive Events

Independent events and mutually exclusive events are two different concepts in probability that describe the relationship between events.

Independent events: Two events are considered independent if the occurrence or outcome of one event does not affect the probability of the occurrence or outcome of the other event. In other words, the probability of one event happening remains the same regardless of whether the other event occurs or not.
For example, consider flipping a fair coin twice. The outcome of the first coin flip (heads or tails) does not impact the outcome of the second coin flip. The events “getting heads on the first flip” and “getting tails on the second flip” are independent events. The probability of getting heads on the first flip is 1/2, and the probability of getting tails on the second flip is also 1/2. Multiplying these probabilities together, we obtain the probability of both events occurring: (1/2) * (1/2) = 1/4.

In general, if two events A and B are independent, the joint probability of both events occurring is equal to the product of their individual probabilities: P(A and B) = P(A) * P(B).

Mutually exclusive events: Two events are considered mutually exclusive (or disjoint) if they cannot occur simultaneously. If one event happens, the other event cannot occur at the same time. The occurrence of one event excludes the possibility of the other event happening.
For example, when rolling a fair six-sided die, the events “getting a 2” and “getting a 4” are mutually exclusive. If the die shows a 2, it cannot simultaneously show a 4. The probability of getting a 2 is 1/6, and the probability of getting a 4 is also 1/6. Since these events are mutually exclusive, the probability of either event happening is the sum of their individual probabilities: P(getting a 2 or getting a 4) = P(getting a 2) + P(getting a 4) = 1/6 + 1/6 = 1/3.

In general, if two events A and B are mutually exclusive, the probability of both events occurring simultaneously is zero: P(A and B) = 0.

Bayes’ theorem

Bayes’ theorem is a fundamental concept in probability theory and statistics that allows us to update our beliefs or probabilities based on new evidence or information. It provides a framework for calculating conditional probabilities in a reverse manner compared to the usual conditional probability calculation.

Bayes’ theorem is expressed as follows:

P(A|B) = (P(B|A) * P(A)) / P(B)

Where:

P(A|B) is the posterior probability of event A given event B has occurred or is known.
P(B|A) is the likelihood or conditional probability of event B given event A has occurred.
P(A) is the prior probability of event A, which represents our initial belief or probability of event A occurring.
P(B) is the marginal probability of event B, representing the overall probability of event B occurring.
The theorem provides a way to update our prior belief (P(A)) based on new evidence (P(B|A)) and calculate the new probability (P(A|B)) given this evidence.

Here’s an example to illustrate the application of Bayes’ theorem:

Suppose there is a certain medical test for a disease, and the prevalence of the disease in the population is low (let’s say 1%). The test has a 95% accuracy rate in detecting the disease (P(Positive Test|Disease) = 0.95) and a 90% accuracy rate in correctly giving a negative result for healthy individuals (P(Negative Test|No Disease) = 0.90).

Now, if a randomly selected person tests positive for the disease (event B), we want to calculate the probability that the person actually has the disease (event A).

Let’s define the events:

A: Person has the disease.
B: Person tests positive for the disease.
We are given:

P(A) = 0.01 (prevalence of the disease)
P(Positive Test|A) = 0.95 (test accuracy when the person has the disease)
We can use Bayes’ theorem to calculate P(A|B):

P(A|B) = (P(B|A) * P(A)) / P(B)

P(A|B) = (0.95 * 0.01) / P(B)

To find P(B), we need to consider both the true positives (P(B|A)) and the false positives (P(B|~A)):

P(B) = P(B|A) * P(A) + P(B|~A) * P(~A)
P(B) = 0.95 * 0.01 + (1–0.90) * (1–0.01)

Now, we can calculate P(A|B):

P(A|B) = (0.95 * 0.01) / P(B)

By plugging in the values, we can obtain the updated probability of having the disease given a positive test result.

Bayes’ theorem is widely used in various fields, including statistics, machine learning, and data science, as it provides a principled way to update probabilities based on new evidence. It allows us to make more informed decisions and revise our beliefs as new information becomes available.

# Calculate the probability of picking Coin A given heads

# Prior probabilities
P_A = 0.5  # Probability of picking Coin A
P_B = 0.5  # Probability of picking Coin B

# Likelihoods
P_heads_given_A = 0.5  # Probability of getting heads given Coin A
P_heads_given_B = 0.8  # Probability of getting heads given Coin B

# Calculate the marginal probability of getting heads
P_heads = P_A * P_heads_given_A + P_B * P_heads_given_B

# Calculate the posterior probability of picking Coin A given heads using Bayes' theorem
P_A_given_heads = (P_A * P_heads_given_A) / P_heads

# Print the result
print(f"The probability of picking Coin A given heads is: {P_A_given_heads:.2f}")

Master Probability in Data Science

Random Experiment

Trial

Outcome

Sample Space

Event

Type of Event

What is Probability?

Empirical Probability Vs Theoretical Probability

Random Variable

Probability Distribution of a Discrete Random Variable

Mean of a Random Variable

Variance of a random variable

Venn Diagrams in Probability

Contingency Tables in Probability

Type of Probability

Joint Probability

Marginal Probability

Conditional Probability

Independent Vs Mutually Exclusive Events

Bayes’ theorem

Written by Jinendrasingh

No responses yet