Sampling Methods

Sampling methods select subsets of a population for analysis, balancing accuracy and efficiency. They’re crucial in data science for studying large datasets. This article explores random and stratified sampling, an example, and their uses.

Simple Random Sampling

Every unit has an equal chance of selection:

  • Probability: \( P(\text{selected}) = \frac{1}{N} \), where \( N \) is population size.
  • Method: Random number generators.
  • Pros: Unbiased, simple.
  • Cons: May miss subgroups.

Stratified Sampling

Divide population into strata, then sample proportionally:

  • Formula: Sample size per stratum = \( n_h = n \cdot \frac{N_h}{N} \), where \( N_h \) is stratum size.
  • Method: Random within strata.
  • Pros: Represents subgroups.
  • Cons: Requires strata knowledge.

Practical Example

Population: 1000 students (600 male, 400 female), sample 100:

  • Random: Pick 100 randomly, might get 70 males, 30 females.
  • Stratified: 60 males (\( 100 \cdot \frac{600}{1000} \)), 40 females (\( 100 \cdot \frac{400}{1000} \)).

Stratified ensures gender balance.

Applications

Used in:

  • Surveys: Polling (e.g., election predictions).
  • Quality Control: Testing product batches.
  • Research: Studying diverse populations.

Enables scalable analysis.