A
Alternative Hypothesis (H₁)
The claim you are trying to find evidence for
The alternative hypothesis is the statement that you are trying to support with statistical evidence. It represents the effect or difference you believe exists in the population. Example: H₁: μ ≠ 50 (the population mean is not equal to 50).
ANOVA (Analysis of Variance)
A test comparing means across three or more groups
ANOVA tests whether there are statistically significant differences between the means of three or more independent groups. It uses the F-distribution and partitions total variability into between-group and within-group components.
Arithmetic Mean
The sum of all values divided by the count
The most common measure of central tendency. Calculated as: x̄ = Σx / n. Sensitive to outliers. Used when data is roughly symmetric and continuous.
Association
A statistical relationship between two variables
Two variables are associated when knowing the value of one tells you something about the value of the other. Association does not imply causation — a third variable may explain the relationship.
B
Bar Chart
A graph using rectangular bars to represent categorical data
A bar chart displays the frequency or relative frequency of categorical data. The height (or length) of each bar is proportional to the value it represents. Unlike histograms, bars in a bar chart have spaces between them since the categories are discrete.
Bayes' Theorem
A formula for updating probability given new evidence
Bayes' Theorem calculates conditional probability: P(A|B) = P(B|A) × P(A) / P(B). It allows you to update a prior belief (P(A)) with new evidence (B) to get a posterior probability (P(A|B)). Foundational for Bayesian statistics and machine learning.
Bias
A systematic error in a statistical estimator or study design
Bias is a consistent tendency for an estimator to over- or under-estimate the true parameter. Common sources include sampling bias (non-representative samples), response bias, and confirmation bias in data collection.
Binomial Distribution
Models the number of successes in n independent trials
The binomial distribution applies when: (1) there are n fixed trials, (2) each trial has two outcomes (success/failure), (3) each trial has the same probability p of success, (4) trials are independent. Formula: P(X=k) = C(n,k) × pᵏ × (1−p)^(n−k).
Box Plot (Box-and-Whisker Plot)
A visual summary showing the 5-number summary of a dataset
A box plot displays the minimum, Q1, median, Q3, and maximum of a dataset. The "box" spans Q1 to Q3 (IQR). Whiskers extend to the most extreme non-outlier values. Points beyond 1.5 × IQR from the box are plotted as outliers.
C
Central Limit Theorem (CLT)
The sampling distribution of the mean approaches normal as n increases
One of the most important theorems in statistics. For large enough sample sizes (typically n ≥ 30), the distribution of sample means will be approximately normal, regardless of the shape of the population distribution. This justifies using z-tests and t-tests for large samples.
Chi-Square Test (χ²)
Tests for goodness-of-fit or independence of categorical variables
A chi-square test compares observed frequencies to expected frequencies. Used for: (1) goodness-of-fit tests, (2) tests of independence in contingency tables, (3) tests of homogeneity. Requires expected cell frequencies of at least 5.
Confidence Interval (CI)
A range of plausible values for a population parameter
A confidence interval provides a range of values that likely contains the true population parameter. A 95% CI means that if we repeated the sampling process 100 times, about 95 of those intervals would contain the true parameter. Format: Point Estimate ± Margin of Error.
Correlation
A measure of the strength and direction of a linear relationship
Correlation (typically Pearson's r) measures how closely two variables move together linearly. r = +1 is a perfect positive relationship, r = −1 is perfect negative, r = 0 indicates no linear relationship. Important: correlation does not equal causation.
Covariance
Measures how two variables change together
Covariance indicates the direction of the linear relationship between two variables. Positive covariance means they tend to increase together; negative means one tends to increase as the other decreases. Unlike correlation, covariance is not standardized and depends on the scale of measurement.
D
Data
Collected facts, measurements, or observations
Data consists of values collected through observation, measurement, or experiment. Types include: quantitative (numerical) data which can be discrete or continuous, and qualitative (categorical) data which can be nominal or ordinal.
Degrees of Freedom (df)
The number of values in a calculation that are free to vary
Degrees of freedom represent the number of independent observations available to estimate a parameter. For a one-sample t-test: df = n − 1. For a chi-square test of independence: df = (r−1)(c−1). Affects the shape of t and chi-square distributions.
Descriptive Statistics
Summary measures that describe a dataset's key characteristics
Descriptive statistics summarize data using measures of central tendency (mean, median, mode), variability (standard deviation, variance, range, IQR), and shape (skewness, kurtosis). They describe the data you have without making inferences about a larger population.
Distribution
How values in a dataset or probability space are spread out
A distribution describes the pattern of variation in a set of data or the probabilities of different outcomes. Common distributions include normal, binomial, t, chi-square, F, and Poisson. A distribution is fully described by its parameters (e.g., mean and variance for the normal distribution).
E
Effect Size
A quantitative measure of the magnitude of an experimental effect
Effect size measures how large or practically meaningful a difference or relationship is. Common measures: Cohen's d (for mean differences), r (for correlations), η² (for ANOVA). Unlike p-values, effect sizes are not influenced by sample size, making them essential for interpreting results.
Expected Value
The long-run average outcome of a random variable
The expected value E(X) is the weighted average of all possible values of a random variable, where each value is weighted by its probability. For a fair 6-sided die: E(X) = (1+2+3+4+5+6)/6 = 3.5. It represents what you'd "expect" on average over many repetitions.
Extrapolation
Using a model to predict beyond the range of observed data
Extrapolation uses a fitted model (e.g., regression) to make predictions outside the range of the original data. This is risky because patterns established within a data range may not hold outside it. Interpolation (predicting within the data range) is generally safer.
F
F-Distribution
A probability distribution used in ANOVA and regression
The F-distribution is a continuous probability distribution that arises in the context of comparing variances. It is used in ANOVA (to test whether group means are equal), F-tests (to compare model fit), and regression analysis (to test overall model significance).
Frequency
The number of times a value occurs in a dataset
Frequency is the raw count of how often a particular value or category appears in a dataset. Relative frequency is the proportion (frequency ÷ total), and cumulative frequency is the running total of frequencies up to a given value.
Frequency Distribution
A summary showing how often each value or range appears
A frequency distribution organizes data into categories or intervals and shows the count (and often percentage) for each. For continuous data, values are grouped into bins (class intervals). Visualized as a frequency table or histogram.
G
Goodness-of-Fit Test
Tests whether observed data matches an expected distribution
A goodness-of-fit test (typically using chi-square) evaluates how well observed data matches an expected distribution or model. The null hypothesis is that the data follows the proposed distribution. Commonly used to test fairness of dice, cards, or other discrete outcomes.
H
Histogram
A bar chart for continuous numerical data with no gaps between bars
A histogram displays the distribution of continuous numerical data by dividing values into equal-width bins (intervals) and showing the frequency of values in each bin. Unlike bar charts, histogram bars are adjacent (no gaps) because the data is continuous. The shape reveals distribution properties like skewness.
Hypothesis Test
A formal procedure for evaluating a claim about a population
Hypothesis testing uses sample data to evaluate competing claims (null vs. alternative hypothesis) about a population. The procedure: (1) state hypotheses, (2) collect data, (3) calculate test statistic, (4) find p-value, (5) compare to significance level, (6) make conclusion.
I
Independence
Two events are independent if one does not affect the other's probability
Events A and B are independent if P(A∩B) = P(A) × P(B), equivalently P(A|B) = P(A). In hypothesis testing, the independence of observations is a key assumption. Statistical tests of independence (like the chi-square test) evaluate whether two categorical variables are related.
Inferential Statistics
Using sample data to make conclusions about a larger population
Inferential statistics use probability to extend conclusions from a sample to the population. This includes hypothesis testing, confidence intervals, regression analysis, and ANOVA. The key challenge is ensuring the sample is representative of the population.
Interquartile Range (IQR)
The range of the middle 50% of data: IQR = Q3 − Q1
The IQR measures the spread of the middle half of the data and is resistant to outliers. Used to identify outliers (values more than 1.5 × IQR below Q1 or above Q3 are suspected outliers). Displayed visually in box plots.
K
Kurtosis
A measure of the "tailedness" of a distribution
Kurtosis describes how heavy or light the tails of a distribution are relative to a normal distribution. High kurtosis (leptokurtic) means heavy tails and a sharp peak. Low kurtosis (platykurtic) means light tails. Normal distribution kurtosis = 3 (or excess kurtosis = 0).
L
Least Squares Method
A method for finding the best-fit line by minimizing squared residuals
The Ordinary Least Squares (OLS) method finds regression coefficients by minimizing the sum of squared residuals (SSR = Σ(yᵢ − ŷᵢ)²). It produces the BLUE (Best Linear Unbiased Estimator) under the Gauss-Markov assumptions.
Levels of Measurement
The four scales of measurement: nominal, ordinal, interval, ratio
Nominal: categories with no order (colors, gender). Ordinal: ordered categories with no meaningful gap (rankings, Likert scale). Interval: ordered with equal intervals but no true zero (temperature °C). Ratio: all above plus a true zero (height, weight, income). Higher levels allow more statistical operations.
M
Mean
The arithmetic average of a dataset
The mean is calculated by summing all values and dividing by the count: x̄ = Σx / n. It is the most common measure of central tendency but is sensitive to outliers. Population mean is denoted μ; sample mean is denoted x̄.
Median
The middle value of an ordered dataset
The median is the value that divides an ordered dataset in half — 50% of values fall below, 50% above. Resistant to outliers, making it better for skewed data. When n is even, it's the average of the two middle values.
Mode
The most frequently occurring value(s) in a dataset
The mode is the value that appears most often in a dataset. There can be one mode (unimodal), two modes (bimodal), or multiple modes (multimodal). If all values are unique, there is no mode. The mode is the only measure of center applicable to nominal data.
Multicollinearity
When two or more predictor variables are highly correlated in regression
Multicollinearity occurs when independent variables in a multiple regression model are highly correlated with each other. This makes it difficult to isolate individual effects and leads to unreliable coefficient estimates. Detected using the Variance Inflation Factor (VIF).
N
Normal Distribution
A symmetric bell-shaped distribution defined by μ and σ
The normal distribution is the most important distribution in statistics. It is symmetric about the mean, with most values clustering near the mean and fewer in the tails. Fully described by μ (mean) and σ (standard deviation). About 68% of values fall within 1σ, 95% within 2σ, 99.7% within 3σ of the mean.
Null Hypothesis (H₀)
The default assumption of no effect or no difference
The null hypothesis is the claim you assume to be true in the absence of evidence. It typically represents "no effect," "no difference," or "no relationship." Example: H₀: μ = 50. Statistical tests try to find evidence against H₀. Failing to reject H₀ does not prove it is true.
O
Outlier
An observation that falls far from the rest of the data
An outlier is a data point that differs significantly from other observations. Outliers can result from measurement errors, data entry mistakes, or genuine extreme values. They can heavily influence the mean and standard deviation. The IQR method defines outliers as values more than 1.5 × IQR from Q1 or Q3.
P
p-Value
The probability of observing results as extreme as yours, assuming H₀ is true
The p-value is the probability of obtaining results at least as extreme as observed, if the null hypothesis were true. A small p-value (typically < 0.05) provides evidence against H₀. p-value does NOT measure the probability that H₀ is true, nor the magnitude of the effect.
Parameter
A numerical measure describing a characteristic of a population
A parameter is a fixed numerical characteristic of a population (e.g., population mean μ, population variance σ²). Parameters are usually unknown and estimated from sample statistics. Denoted with Greek letters (μ, σ, β) to distinguish from sample statistics (x̄, s, b).
Pearson Correlation Coefficient (r)
Measures the strength of linear relationship between two variables
Pearson's r ranges from −1 to +1. r = 1: perfect positive linear relationship. r = −1: perfect negative. r = 0: no linear relationship. Formula: r = Σ[(xᵢ−x̄)(yᵢ−ȳ)] / √[Σ(xᵢ−x̄)² × Σ(yᵢ−ȳ)²]. Assumes a linear relationship and normally distributed variables.
Percentile
The value below which a given percentage of observations fall
The pth percentile is the value below which p% of observations fall. For example, if your test score is at the 80th percentile, you scored higher than 80% of test-takers. The 25th, 50th, and 75th percentiles are called Q1, Q2 (median), and Q3.
Population
The complete set of individuals or objects of interest
A population is the entire group about which you want to draw conclusions. It can be finite (all 500 employees) or conceptually infinite (all future coin flips). Because populations are often too large to measure entirely, we use samples to estimate population parameters.
Probability
A number from 0 to 1 expressing the likelihood of an event
Probability quantifies uncertainty. P(A) = 0 means event A is impossible; P(A) = 1 means it is certain. For equally likely outcomes, P(A) = (number of favorable outcomes) / (total outcomes). Probability theory underpins all of inferential statistics.
Q
Quartiles (Q1, Q2, Q3)
Values that divide a dataset into four equal parts
Quartiles split sorted data into four equal groups: Q1 (25th percentile), Q2 (50th percentile / median), Q3 (75th percentile). The interquartile range (IQR = Q3 − Q1) spans the middle 50% of data. Quartiles are resistant to outliers and are displayed in box plots.
R
R-squared (R²)
The proportion of variance in Y explained by the regression model
R² (coefficient of determination) ranges from 0 to 1. R² = 0.80 means the model explains 80% of the variability in the response variable. R² = 1 − (SSRes / SSTot). Higher values indicate a better fit. In simple regression, R² = r² (the squared correlation).
Random Sample
A sample where every member of the population has an equal chance of selection
A random sample is the foundation of statistical inference. When sampling is truly random, the sample is likely to represent the population. Simple random sampling, stratified sampling, and cluster sampling are all methods for obtaining random samples.
Range
The difference between the maximum and minimum values
Range = Maximum − Minimum. The simplest measure of variability, but heavily influenced by outliers. For example, if one value is far from the rest, the range can be misleadingly large. Use in conjunction with standard deviation and IQR for a complete picture of spread.
Regression Analysis
A method to model relationships between variables and make predictions
Regression analysis models the relationship between a dependent variable (Y) and one or more independent variables (X). Simple linear regression uses one predictor: ŷ = b₀ + b₁x. Multiple regression uses two or more. It can be used for prediction, explanation, and causal inference (with caution).
Residual
The difference between an observed and predicted value in regression
A residual is the "error" in a regression prediction: eᵢ = yᵢ − ŷᵢ. Residual analysis is used to check whether regression assumptions (linearity, homoscedasticity, normality, independence) are met. Large or patterned residuals indicate model inadequacy.
S
Sample
A subset of the population selected for analysis
A sample is a portion of the population used to estimate population parameters. A good sample is representative of the population. Sample statistics (x̄, s) are used to estimate population parameters (μ, σ). Larger samples generally yield more precise estimates.
Sampling Distribution
The probability distribution of a statistic across all possible samples
If you repeatedly drew samples of size n and calculated the mean each time, the distribution of those means would be the sampling distribution of x̄. It has mean μ and standard deviation σ/√n (the standard error). As n increases, it approaches normality (Central Limit Theorem).
Significance Level (α)
The threshold probability for rejecting the null hypothesis
The significance level α is set before conducting a test and represents the maximum acceptable probability of a Type I error (rejecting a true null hypothesis). Common values: α = 0.05 (5%), 0.01 (1%), 0.10 (10%). Reject H₀ when p-value < α.
Skewness
A measure of asymmetry in a distribution
Positive skewness (right-skewed): the tail extends to the right; mean > median. Negative skewness (left-skewed): the tail extends to the left; mean < median. Zero skewness indicates symmetry. Skewness affects which measure of center to use and which statistical tests are appropriate.
Standard Deviation
The average distance of data points from the mean
Standard deviation measures spread in the original units of measurement. Formula (population): σ = √[Σ(xᵢ−μ)²/N]. Formula (sample): s = √[Σ(xᵢ−x̄)²/(n−1)]. A low std dev means data clusters tightly around the mean; a high std dev indicates wide spread.
Standard Error (SE)
The standard deviation of a sampling distribution
The standard error of the mean is SE = σ/√n. It measures how much the sample mean varies from sample to sample. Smaller samples have larger standard errors (more uncertainty). SE decreases as sample size increases, which is why larger samples give more precise estimates.
Statistical Power
The probability of correctly rejecting a false null hypothesis
Power = 1 − β (1 minus the probability of a Type II error). Power increases with larger sample sizes, larger effect sizes, and higher significance levels (α). A power of 0.80 (80%) is typically considered adequate, meaning you have an 80% chance of detecting a true effect if one exists.
T
t-Distribution
A bell-shaped distribution with heavier tails than the normal distribution
The t-distribution is used for hypothesis testing and confidence intervals when the population standard deviation is unknown and/or the sample is small. As degrees of freedom increase, it approaches the standard normal distribution. At df = ∞, they are identical.
t-Test
A hypothesis test comparing means using the t-distribution
t-tests compare means when the population standard deviation is unknown. Types: one-sample t-test (compare sample to known value), independent samples t-test (compare two groups), paired samples t-test (compare matched pairs or before/after). Assumes approximately normal data.
Type I Error (α)
Rejecting a true null hypothesis (false positive)
A Type I error occurs when you reject H₀ even though it is actually true. The probability of a Type I error is α (significance level). By setting α = 0.05, you accept a 5% chance of incorrectly rejecting a true null hypothesis. Also called a "false positive."
Type II Error (β)
Failing to reject a false null hypothesis (false negative)
A Type II error occurs when you fail to reject H₀ even though it is false (a real effect exists but you missed it). The probability of a Type II error is β. Statistical power = 1 − β. Reducing β requires larger samples, larger effect sizes, or higher α. Also called a "false negative."
U
Uniform Distribution
A distribution where all outcomes are equally likely
In a discrete uniform distribution, each of n outcomes has probability 1/n. In a continuous uniform distribution over [a,b], every value in the interval has equal probability density. A fair die roll follows a discrete uniform distribution over {1, 2, 3, 4, 5, 6}.
V
Variable
A characteristic that can take on different values
Variables can be quantitative (numerical) or categorical (non-numerical). In regression, the dependent variable (Y) is what you're trying to explain, and independent variables (X) are the predictors. Lurking or confounding variables can create misleading associations.
Variance
The average squared deviation from the mean
Variance measures how spread out data is from the mean. Population variance: σ² = Σ(xᵢ−μ)²/N. Sample variance: s² = Σ(xᵢ−x̄)²/(n−1). Squaring the deviations makes all values positive and gives more weight to larger deviations. Standard deviation = √Variance.
W
Weighted Mean
An average where values are weighted by their importance or frequency
The weighted mean accounts for the fact that some values may be more important than others. Formula: x̄w = Σ(wᵢxᵢ) / Σwᵢ. Example: calculating GPA where different courses have different credit hours, or computing a portfolio return where assets have different weights.
Y
y-Intercept
The predicted value of Y when X = 0 in a regression equation
The y-intercept (b₀) in a regression equation ŷ = b₀ + b₁x is the value of the response variable when all predictors equal zero. In many practical contexts, the y-intercept may not have a meaningful interpretation (e.g., if x = 0 is outside the data range or physically impossible).
Z
Z-Score (Standard Score)
The number of standard deviations a value is from the mean
z = (x − μ) / σ. A z-score of +2 means the value is 2 standard deviations above the mean. Z-scores standardize values, allowing comparison across different scales. Used to find probabilities using the standard normal distribution (Z table).
Z-Test
A hypothesis test using the standard normal distribution
A z-test is used when the population standard deviation (σ) is known and/or the sample size is large (n ≥ 30). It uses the standard normal distribution to calculate the test statistic: z = (x̄ − μ₀) / (σ/√n). For proportions: z = (p̂ − p₀) / √[p₀(1−p₀)/n].