Introduction
The normal distribution, also known as the Gaussian distribution, is one of the most important probability distributions in statistics. It models many natural phenomena such as heights, weights, test scores, and errors in measurements. Its distinctive bell-shaped curve makes it easily recognizable.
Definition
A random variable X
is normally distributed with mean μ
and standard deviation σ
if its probability density function (PDF) is:
f(x) = (1 / (σ√(2π))) * e-(x - μ)² / (2σ²)
This is denoted as:
X ~ N(μ, σ²)
Characteristics
- Symmetric about the mean
μ
- Mean, median, and mode are all equal
- Bell-shaped curve
- Total area under the curve = 1
- Follows the empirical rule (68-95-99.7 rule)
Empirical Rule (68-95-99.7)
For a normal distribution:
- ~68% of values fall within 1 standard deviation of the mean
- ~95% fall within 2 standard deviations
- ~99.7% fall within 3 standard deviations
Standard Normal Distribution
When μ = 0
and σ = 1
, the distribution is called the standard normal distribution and is denoted as:
Z ~ N(0, 1)
You can convert any normal variable to a standard normal using the Z-score:
Z = (X - μ) / σ
Applications
- Modeling measurement errors
- Statistical inference (e.g., confidence intervals, hypothesis testing)
- Standardized test scoring
- Quality control in manufacturing
Example
Suppose the heights of adult men are normally distributed with mean μ = 175 cm
and standard deviation σ = 10 cm
. What is the probability that a randomly chosen man is taller than 190 cm?
Z = (190 - 175) / 10 = 1.5
P(X > 190) = P(Z > 1.5) ≈ 0.0668
So there’s about a 6.68% chance a randomly selected man is taller than 190 cm.
Python Code Example
from scipy.stats import norm
Mean and standard deviation
mu = 175 sigma = 10
Probability of height greater than 190 cm
p = 1 - norm.cdf(190, loc=mu, scale=sigma) print("P(X > 190):", p)
Conclusion
The normal distribution is central to statistics due to the Central Limit Theorem, which states that the sum of many independent random variables tends to be normally distributed. Its properties make it a cornerstone in many statistical procedures, from hypothesis testing to regression analysis.