SQL for Data Analysis

SQL (Structured Query Language) is a powerful tool for querying and analyzing data in relational databases. From business reports to scientific research, SQL enables precise data manipulation. This MathMultiverse guide covers core commands, joins, aggregations, visualizations, and applications.

Basic Commands

Core SQL commands enable data retrieval and filtering.

SELECT

Extracts columns:

SELECT name, revenue FROM customers;

Relational algebra: \(\pi_{name, revenue} (Customers)\).

WHERE

Filters rows:

SELECT name FROM customers WHERE revenue > 10000;

Selection: \(\sigma_{revenue > 10000} (Customers)\).

ORDER BY

Sorts results:

SELECT name, revenue 
FROM customers 
WHERE revenue > 5000 
ORDER BY revenue DESC;

Complexity: \(O(n \log n)\) with sorting.

Joins & Aggregations

INNER JOIN

Matches rows:

SELECT c.name, o.amount 
FROM customers c 
INNER JOIN orders o 
ON c.id = o.cust_id;

Join: \(Customers \bowtie_{c.id = o.cust_id} Orders\).

Aggregations

Summarizes data:

SELECT c.region, SUM(o.amount) 
FROM customers c 
INNER JOIN orders o 
ON c.id = o.cust_id 
GROUP BY c.region;

Average: \(\text{AVG}(amount) = \frac{\sum amount_i}{n}\).

Example Query

Daily sales over $100:

SELECT date, SUM(amount) AS daily_total 
FROM sales 
WHERE amount > 100 
GROUP BY date 
ORDER BY date;

Total: \(T(d) = \sum_{\text{amount}_i > 100, \text{date}_i = d} \text{amount}_i\).

DateDaily Total ($)
2025-03-017500
2025-03-026200
2025-03-038900

Visualizations

Daily Sales Trend

Applications

  • Sales Reporting: Total sales by region.
  • Customer Segmentation: Group by age.
  • Research: Average temperature in 2025.
  • Scalability: Handles petabytes with Snowflake.