Screenshot_20250705_210307

Normal Distribution: A Complete and Detailed Guide


Introduction

The normal distribution, also known as the Gaussian distribution, is one of the most important probability distributions in statistics. It models many natural phenomena such as heights, weights, test scores, and errors in measurements. Its distinctive bell-shaped curve makes it easily recognizable.

Definition

A random variable X is normally distributed with mean μ and standard deviation σ if its probability density function (PDF) is:

f(x) = (1 / (σ√(2π))) * e-(x - μ)² / (2σ²)

This is denoted as:

X ~ N(μ, σ²)

Characteristics

  • Symmetric about the mean μ
  • Mean, median, and mode are all equal
  • Bell-shaped curve
  • Total area under the curve = 1
  • Follows the empirical rule (68-95-99.7 rule)

Empirical Rule (68-95-99.7)

For a normal distribution:

  • ~68% of values fall within 1 standard deviation of the mean
  • ~95% fall within 2 standard deviations
  • ~99.7% fall within 3 standard deviations

Standard Normal Distribution

When μ = 0 and σ = 1, the distribution is called the standard normal distribution and is denoted as:

Z ~ N(0, 1)

You can convert any normal variable to a standard normal using the Z-score:

Z = (X - μ) / σ

Applications

  • Modeling measurement errors
  • Statistical inference (e.g., confidence intervals, hypothesis testing)
  • Standardized test scoring
  • Quality control in manufacturing

Example

Suppose the heights of adult men are normally distributed with mean μ = 175 cm and standard deviation σ = 10 cm. What is the probability that a randomly chosen man is taller than 190 cm?

Z = (190 - 175) / 10 = 1.5

P(X > 190) = P(Z > 1.5) ≈ 0.0668

So there’s about a 6.68% chance a randomly selected man is taller than 190 cm.

Python Code Example

from scipy.stats import norm

Mean and standard deviation

mu = 175 sigma = 10

Probability of height greater than 190 cm

p = 1 - norm.cdf(190, loc=mu, scale=sigma) print("P(X > 190):", p)

Conclusion

The normal distribution is central to statistics due to the Central Limit Theorem, which states that the sum of many independent random variables tends to be normally distributed. Its properties make it a cornerstone in many statistical procedures, from hypothesis testing to regression analysis.

Tags: No tags

Add a Comment

Your email address will not be published. Required fields are marked *