OIG4

Geometric Random Variable: A Complete and Detailed Guide


Introduction

In probability theory, the geometric random variable is one of the most fundamental discrete random variables. It models the number of Bernoulli trials needed to achieve the first success. This makes it particularly useful in scenarios involving repeated, independent attempts at something with constant success probability—such as flipping a coin until heads appears, or testing a machine until it works.

Definition

A geometric random variable is a type of discrete random variable that represents the number of independent Bernoulli trials required to get the first success. Each trial results in either a success (with probability p) or a failure (with probability 1 - p).

There are two common definitions of the geometric random variable, depending on how you count the trials:

  1. Definition 1: Number of trials until the first success (includes the success itself): X = 1, 2, 3, ...
  2. Definition 2: Number of failures before the first success: Y = 0, 1, 2, ...

For this article, we will focus on Definition 1, which is more widely used.

Probability Mass Function (PMF)

If X is a geometric random variable with success probability p, then:

P(X = x) = (1 - p)x - 1 * p, for x = 1, 2, 3, ...

Explanation:

  • (1 - p)x - 1: The probability of failing the first x - 1 times
  • p: The probability of succeeding on the x-th trial

Cumulative Distribution Function (CDF)

The cumulative probability that the first success occurs on or before trial x is:

P(X ≤ x) = 1 - (1 - p)x

This tells us the likelihood of seeing a success within the first x trials.

Mean and Variance

Let X ~ Geometric(p). Then:

  • Expected value (Mean): E[X] = 1 / p
  • Variance: Var(X) = (1 - p) / p²

This implies that the rarer the success (smaller p), the longer (on average) you’ll wait to see the first success.

Memoryless Property

The geometric distribution is unique among discrete distributions for having the memoryless property:

P(X > m + n | X > m) = P(X > n)

This means the probability of the process lasting more than m + n trials, given that it has already lasted m trials without success, is independent of m. Past failures do not affect future probabilities.

Examples

Example 1: Coin Toss

Suppose you flip a fair coin (p = 0.5) and want to find the probability that the first head appears on the 3rd flip:

P(X = 3) = (1 - 0.5)^2 * 0.5 = 0.125

There is a 12.5% chance that you will get the first head on the third toss.

Example 2: Defective Machine

Imagine a machine produces items, and each item has a 10% chance of being defective (p = 0.1). Let X be the number of items tested until the first defective one is found.

Expected number of items to test: E[X] = 1 / 0.1 = 10
So, on average, you expect to find a defective item after testing 10 items.

Relation to Bernoulli and Binomial Distributions

  • Bernoulli trial: a single trial with success/failure outcome
  • Binomial distribution: counts number of successes in n trials
  • Geometric distribution: counts number of trials until first success

The geometric distribution can be seen as a “waiting time” model for the first success.

Applications

  • Reliability Engineering: Time until first failure in a system
  • Quality Control: Number of items tested before finding a defect
  • Computer Science: Iterations until a loop condition is met
  • Finance: Modeling rare events like defaults or crashes

Python Code Example

import numpy as np

from scipy.stats import geom

Parameters

p = 0.3  # probability of success

PMF: probability first success on 4th trial

x = 4 prob = geom.pmf(x, p) print(f"P(X = {x}) = {prob:.4f}")

Expected value

mean = geom.mean(p) print(f"Expected value: {mean}")

Variance

var = geom.var(p) print(f"Variance: {var:.2f}")

Conclusion

The geometric random variable is an essential concept in probability, modeling the number of attempts before success in repeated independent trials. It’s particularly useful in reliability studies, simulation modeling, and stochastic processes. Its simplicity, memoryless property, and real-world relevance make it one of the foundational tools in both theoretical and applied statistics.


Tags: No tags

Add a Comment

Your email address will not be published. Required fields are marked *