Descriptive Statistics

What is Descriptive Statistics?

Descriptive statistics is a branch of statistics that summarizes and describes the characteristics of a dataset. Unlike inferential statistics (which draws conclusions about a larger population), descriptive statistics only describe the data you actually have — without making predictions or generalizations.

Think of descriptive statistics as a "summary report" for your data. Instead of sharing a spreadsheet with thousands of numbers, you share a few key statistics that capture what the data looks like.

Key Insight: Descriptive statistics are the first step in any data analysis. Before you run tests or build models, you need to understand your data — and descriptive statistics is how you do it.

Types of Descriptive Statistics

There are four main categories of descriptive statistics:

Measures of Frequency

Count, percent, frequency — how often values appear

Central Tendency

Mean, median, mode — the "center" of your data

Variability

Range, variance, std dev — how spread out the data is

Position

Percentiles, quartiles, z-scores — where values fall

Measures of Frequency

Frequency measures describe how often values appear in a dataset. They answer questions like "How many people chose option A?" or "What percentage of customers made a purchase?"

Frequency vs. Relative Frequency

Frequency: The raw count of how often a value occurs
Relative Frequency: The proportion (or percentage) of times a value occurs out of the total
Cumulative Frequency: The running total of frequencies up to a given value

Example: Course Grade Distribution

A class of 40 students received the following grades:

Grade	Frequency	Relative Frequency	Cumulative Frequency
A	8	20%	20%
B	14	35%	55%
C	12	30%	85%
D	4	10%	95%
F	2	5%	100%
Total	40	100%	–

The most common grade was B (35%), and 55% of students received an A or B.

Measures of Central Tendency

Measures of central tendency identify the "center" or "typical value" in a dataset. The three main measures are the mean, median, and mode.

1. Mean (Arithmetic Average)

The mean is calculated by summing all values and dividing by the count. It is the most commonly used measure of center.

x̄ = (Σx) / n = (x₁ + x₂ + ... + xₙ) / n

Example: Calculating the Mean

Dataset: 4, 7, 2, 9, 6, 8, 3

Mean = (4 + 7 + 2 + 9 + 6 + 8 + 3) / 7 = 39 / 7 = 5.57

Watch out for outliers! The mean is sensitive to extreme values. A single very high or low value can pull the mean away from the "typical" value. In such cases, consider using the median instead.

2. Median (Middle Value)

The median is the middle value when data is arranged in ascending order. It is not affected by outliers, making it better for skewed data.

For an odd number of values: the median is the middle number
For an even number of values: the median is the average of the two middle numbers

Example: Finding the Median

Odd dataset: 2, 3, 4, 6, 7, 8, 9 → Median = 6

Even dataset: 2, 3, 4, 6, 7, 8 → Median = (4 + 6) / 2 = 5

3. Mode (Most Frequent Value)

The mode is the value that appears most frequently in a dataset. It's the only measure of center that can be used with categorical data.

A dataset can be unimodal (one mode), bimodal (two modes), or multimodal (multiple modes)
If no value repeats, the dataset has no mode

Example: Finding the Mode

Dataset: 3, 5, 7, 5, 2, 5, 8, 3

The value 5 appears 3 times (more than any other value), so Mode = 5

When to Use Each Measure

Measure	Best For	Affected by Outliers?	Data Type
Mean	Symmetric distributions with no outliers	Yes	Numerical
Median	Skewed distributions or data with outliers	No	Numerical
Mode	Categorical data or finding most common value	No	Any

Measures of Variability

Measures of variability describe how spread out the data values are. Two datasets can have the same mean but very different levels of dispersion.

1. Range

The simplest measure of variability — the difference between the maximum and minimum values.

Range = Maximum Value − Minimum Value

Example

Dataset: 12, 18, 9, 25, 14 → Range = 25 − 9 = 16

2. Variance

Variance measures the average squared deviation from the mean. The squaring prevents positive and negative deviations from canceling out.

σ² = Σ(xᵢ − x̄)² / n (Population) | s² = Σ(xᵢ − x̄)² / (n−1) (Sample)

3. Standard Deviation

Standard deviation is the square root of variance, bringing the measure back to the original units of measurement. It is the most widely used measure of spread.

σ = √[Σ(xᵢ − x̄)² / n] (Population) | s = √[Σ(xᵢ − x̄)² / (n−1)] (Sample)

Step-by-Step: Calculating Standard Deviation

Dataset: 4, 8, 6, 5, 7 (n = 5)

Step 1: Calculate the mean: x̄ = (4+8+6+5+7)/5 = 30/5 = 6

Step 2: Find each deviation from the mean: (4−6)=−2, (8−6)=2, (6−6)=0, (5−6)=−1, (7−6)=1

Step 3: Square each deviation: 4, 4, 0, 1, 1

Step 4: Sum the squared deviations: 4+4+0+1+1 = 10

Step 5: Divide by n−1 (sample): 10/4 = 2.5 (this is the variance)

Step 6: Take the square root: √2.5 = 1.58 (this is the standard deviation)

Rule of Thumb: About 68% of data falls within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations (for normally distributed data).

4. Interquartile Range (IQR)

The IQR is the range of the middle 50% of data — the difference between the 75th percentile (Q3) and the 25th percentile (Q1). It is resistant to outliers.

IQR = Q3 − Q1

Measures of Position

Measures of position describe the location of specific values within the distribution relative to other values.

Percentiles

A percentile is the value below which a given percentage of observations fall. For example, scoring at the 85th percentile means you scored higher than 85% of the group.

Quartiles

Quartiles divide the dataset into four equal parts:

Q1 (25th percentile) — 25% of values fall below this point
Q2 (50th percentile) — This is the median
Q3 (75th percentile) — 75% of values fall below this point

Z-Scores

A z-score tells you how many standard deviations a value is from the mean.

z = (x − μ) / σ

Example: Z-Score

If the mean test score is 70 and the standard deviation is 10, a score of 85 has a z-score of:

z = (85 − 70) / 10 = 1.5

This means the score of 85 is 1.5 standard deviations above the mean.

Real-World Examples

Example 1: Salary Analysis

A company wants to understand employee salaries. The dataset (in thousands): 42, 45, 50, 55, 60, 65, 70, 250

Statistic	Value	Interpretation
Mean	$79,625	Pulled up by the $250K outlier
Median	$57,500	Better representation of "typical" salary
Mode	None	All values unique
Std Dev	$67,850	High due to outlier
Range	$208,000	Large spread

In this case, the median is more informative than the mean due to the $250K outlier.

Example 2: Customer Survey Ratings

Customers rated a product 1–5. Results: 3, 4, 5, 4, 4, 3, 5, 4, 2, 4

Mean: 3.8 (good but not excellent)

Median: 4 (half of customers gave 4 or higher)

Mode: 4 (most common rating)

Latest Articles on Descriptive Statistics

How Statistics is Used in Business Decision Making

Jan 15, 2026 · 8 min read

Understanding p-Values: What They Really Mean

Jan 5, 2026 · 5 min read

Frequently Asked Questions

What is the difference between descriptive and inferential statistics?

Descriptive statistics summarize and describe data you have collected (sample or population). Inferential statistics use that data to make conclusions or predictions about a larger population using probability theory. Descriptive comes first — it tells you what the data looks like before you make any inferences.

When should I use the mean vs. the median?

Use the mean when your data is roughly symmetric and has no extreme outliers. Use the median when your data is skewed (like income data) or contains significant outliers. The median is more "robust" — it's not affected by one or two extreme values the way the mean is.

Why do we use n−1 instead of n for sample standard deviation?

Dividing by n−1 (instead of n) is called Bessel's correction. It provides an unbiased estimate of the population variance from a sample. When we use a sample, we lose one "degree of freedom" because we use the sample mean (which is itself estimated). Dividing by n would consistently underestimate the true population variance.

What does a high standard deviation mean?

A high standard deviation indicates that data points are spread far from the mean — there is high variability. A low standard deviation indicates data points cluster closely around the mean. For example, test scores with a std dev of 20 show much more variability than scores with a std dev of 5.

Can a dataset have more than one mode?

Yes! If two values appear with equal (and maximum) frequency, the dataset is bimodal. If three values tie for most frequent, it's trimodal, and so on. If all values appear with the same frequency (or only appear once), the dataset has no mode.

What is Descriptive Statistics?

Types of Descriptive Statistics

Measures of Frequency

Frequency vs. Relative Frequency

Measures of Central Tendency

1. Mean (Arithmetic Average)

2. Median (Middle Value)

3. Mode (Most Frequent Value)

When to Use Each Measure

Measures of Variability

1. Range

2. Variance

3. Standard Deviation

4. Interquartile Range (IQR)

Measures of Position

Percentiles

Quartiles

Z-Scores

Real-World Examples

Example 1: Salary Analysis

Example 2: Customer Survey Ratings

Related Topics

Latest Articles on Descriptive Statistics

Frequently Asked Questions

Try the Mean Calculator