SQL for Data Analysis
SQL (Structured Query Language) is a powerful tool for querying and analyzing data in relational databases. From business reports to scientific research, SQL enables precise data manipulation. This MathMultiverse guide covers core commands, joins, aggregations, visualizations, and applications.
Basic Commands
Core SQL commands enable data retrieval and filtering.
SELECT
Extracts columns:
SELECT name, revenue FROM customers;
Relational algebra: \(\pi_{name, revenue} (Customers)\).
WHERE
Filters rows:
SELECT name FROM customers WHERE revenue > 10000;
Selection: \(\sigma_{revenue > 10000} (Customers)\).
ORDER BY
Sorts results:
SELECT name, revenue
FROM customers
WHERE revenue > 5000
ORDER BY revenue DESC;
Complexity: \(O(n \log n)\) with sorting.
Joins & Aggregations
INNER JOIN
Matches rows:
SELECT c.name, o.amount
FROM customers c
INNER JOIN orders o
ON c.id = o.cust_id;
Join: \(Customers \bowtie_{c.id = o.cust_id} Orders\).
Aggregations
Summarizes data:
SELECT c.region, SUM(o.amount)
FROM customers c
INNER JOIN orders o
ON c.id = o.cust_id
GROUP BY c.region;
Average: \(\text{AVG}(amount) = \frac{\sum amount_i}{n}\).
Example Query
Daily sales over $100:
SELECT date, SUM(amount) AS daily_total
FROM sales
WHERE amount > 100
GROUP BY date
ORDER BY date;
Total: \(T(d) = \sum_{\text{amount}_i > 100, \text{date}_i = d} \text{amount}_i\).
Date | Daily Total ($) |
---|---|
2025-03-01 | 7500 |
2025-03-02 | 6200 |
2025-03-03 | 8900 |
Visualizations
Daily Sales Trend
Applications
- Sales Reporting: Total sales by region.
- Customer Segmentation: Group by age.
- Research: Average temperature in 2025.
- Scalability: Handles petabytes with Snowflake.