Skip to content

Projects

Selected work

Three end-to-end projects showcasing streaming, lakehouse, and AI-assisted data engineering. Each repo runs locally without external services. Click any card for a live, interactive walkthrough of the architecture.

BRONZE · raw landings SILVER · cleansed events GOLD · business marts

Project 02

Telecom Billing Lakehouse

Medallion (Bronze/Silver/Gold) lakehouse over synthetic CDR data. Airflow + MinIO + Iceberg + Great Expectations + dbt.

  • Apache Airflow
  • Apache Iceberg
  • MinIO / S3
  • Great Expectations
  • dbt
-- before
SELECT *
FROM events
WHERE date_format(ts,'yyyy-MM') = '2026-04';

-- after
SELECT * FROM events
WHERE ts >= TIMESTAMP '2026-04-01'
  AND ts <  TIMESTAMP '2026-05-01';

Project 03

AI-Assisted SQL Optimizer

CLI that uses Claude to suggest Spark/Snowflake query rewrites and partition strategies. Benchmarked against a 50-query corpus.

  • Python
  • Typer
  • Anthropic SDK
  • sqlglot
  • pytest