Sampling Methods
Sampling methods select subsets of a population for analysis, balancing accuracy and efficiency. They’re crucial in data science for studying large datasets. This article explores random and stratified sampling, an example, and their uses.
Simple Random Sampling
Every unit has an equal chance of selection:
- Probability: \( P(\text{selected}) = \frac{1}{N} \), where \( N \) is population size.
- Method: Random number generators.
- Pros: Unbiased, simple.
- Cons: May miss subgroups.
Stratified Sampling
Divide population into strata, then sample proportionally:
- Formula: Sample size per stratum = \( n_h = n \cdot \frac{N_h}{N} \), where \( N_h \) is stratum size.
- Method: Random within strata.
- Pros: Represents subgroups.
- Cons: Requires strata knowledge.
Practical Example
Population: 1000 students (600 male, 400 female), sample 100:
- Random: Pick 100 randomly, might get 70 males, 30 females.
- Stratified: 60 males (\( 100 \cdot \frac{600}{1000} \)), 40 females (\( 100 \cdot \frac{400}{1000} \)).
Stratified ensures gender balance.
Applications
Used in:
- Surveys: Polling (e.g., election predictions).
- Quality Control: Testing product batches.
- Research: Studying diverse populations.
Enables scalable analysis.