Basics of Inferential statistics
Inferential statistics is a branch of statistics that involves using sample data to make inferences or predictions about a larger population.
In data science, inferential statistics play a crucial role in analyzing and interpreting data. It allows data scientists to make predictions and draw conclusions about a larger population based on a smaller sample of data. It is a powerful tool for data scientists to analyze and interpret data, make predictions, and validate statistical models.
Some of the topics that come under inferential statistics include:
Hypothesis testing: The process of testing a hypothesis about a population parameter based on sample data. For example, testing whether the mean height of a population is different from a given value.
Confidence intervals: An interval estimate of a population parameter that is constructed from sample data. For example, estimating the population
mean height within a given confidence level(range of values ).
Regression analysis: A statistical method used to estimate the relationship between a dependent variable and one or more independent variables. For example, predicting the sales of a product based on advertising expenditure.
Analysis of variance (ANOVA): A statistical method used to test whether there are any significant differences between the means of two or more groups. For example, comparing the mean height of individuals from different regions.
Sampling techniques: The process of selecting a representative sample from a population in order to make inferences about the population.
Bayesian statistics: A statistical framework that uses Bayes’ theorem to update the probability of a hypothesis based on new data. For example,
updating the probability of a disease given a positive test result.
Time series analysis: A statistical method used to analyze data that is collected over time. Meteorologists use time series analysis to model weather patterns and predict future weather conditions.
Nonparametric statistics: A statistical method used when the assumptions of parametric statistics are not met.
Chi-square tests: This involves testing the independence or association between two categorical variables. For example, testing whether gender and occupation are independent variables.
Population and Sample in Inferential Stats
In inferential statistics, a population refers to the entire group of individuals, objects, or events that share a common characteristic of interest. The population is the group that researchers are interested in studying and making inferences about.
A sample, on the other hand, is a subset of data taken from a larger population that is used to draw conclusions or make inferences about the population. The main goal of inferential statistics is to use information gathered from a sample to make generalizations about a larger population.
For example, if a researcher is interested in studying the average height of all adult males in a particular country, the population would be all adult males in that country. The researcher may take a sample of, say, 500 adult males and use this sample to estimate the average height of all adult males in the country.
Why ML is closely associated with statistics?
ML is closely associated with statistics because many ML algorithms are based on statistical models, probability theory is essential for many ML algorithms, statistical methods are used to evaluate and improve ML models, and ML is a natural extension of statistical learning.