Sampling Methods

Sampling methods enable efficient analysis of large populations by studying representative subsets. At MathMultiverse, we explore simple random sampling and stratified sampling, with clear formulas, examples, visualizations, and applications in data science and statistics.

Simple Random Sampling

Every unit has an equal chance of selection:

\[ P(\text{selected}) = \frac{1}{N} \]

Sample mean and variance:

\[ \bar{x} = \frac{1}{n} \sum_{i=1}^{n} x_i \] \[ \text{Var}(\bar{x}) = \frac{\sigma^2}{n} \cdot \frac{N - n}{N - 1} \]

Example: \(N = 1000\), \(n = 100\), \(\sigma = 10\), \(\text{Var}(\bar{x}) \approx 0.901\).

Stratified Sampling

Divides population into strata, samples proportionally:

\[ n_h = n \cdot \frac{N_h}{N} \]

Optimal (Neyman) allocation:

\[ n_h = n \cdot \frac{N_h \sigma_h}{\sum_{h=1}^{H} N_h \sigma_h} \]

Mean and variance:

\[ \bar{x}_{\text{st}} = \sum_{h=1}^{H} \frac{N_h}{N} \bar{x}_h \] \[ \text{Var}(\bar{x}_{\text{st}}) = \sum_{h=1}^{H} \left( \frac{N_h}{N} \right)^2 \frac{\sigma_h^2}{n_h} \cdot \frac{N_h - n_h}{N_h - 1} \]

Example: \(N = 1000\), \(n = 100\), \(n_1 = 60\), \(n_2 = 40\), \(\text{Var}(\bar{x}_{\text{st}}) \approx 1.352\).

Examples

Simple Random Sampling

Population: 10,000 employees, sample 500. Variance of proportion:

\[ \text{Var}(p) \approx 0.000456 \]

Stratified Sampling

Proportional: \(n_1 = 300\), \(n_2 = 200\). Lower variance ensures balance.

Visualizations

Sample Mean Variance Comparison

Applications

  • Election Polling: Margin of error \( \approx 0.031 \) for \( n = 1000 \).
  • Quality Control: Defect rate variance \( \approx 0.000471 \).
  • Health Studies: Stratified sampling for accurate prevalence.
  • Data Science: Sampling 1% of 1TB data saves time.