Skip to content

Projects

Selected work

Three end-to-end projects showcasing streaming, lakehouse, and AI-assisted data engineering. Each repo is designed to run locally with synthetic data. The AI-assisted optimizer includes local heuristic and mock-response workflows out of the box, with optional Claude integration when an ANTHROPIC_API_KEY is configured. Click any card for a live, interactive walkthrough of the architecture.

Kafka Spark Delta dbt Streamlit

Project 01

Real-Time Fraud Signals Pipeline

Kafka -> Spark Structured Streaming -> Delta -> dbt -> Streamlit. Exactly-once processing, anomaly detection, dbt tests.

Streaming ingestion · Data quality · Fraud analytics

  • Apache Kafka
  • Spark Structured Streaming
  • Delta Lake
  • dbt
  • Streamlit
BRONZE raw landings SILVER cleansed events GOLD business marts

Project 02

Telecom Billing Lakehouse

Medallion (Bronze/Silver/Gold) lakehouse over synthetic CDR data. Airflow + MinIO + Iceberg + Great Expectations + dbt.

Lakehouse design · Data contracts · Medallion architecture

  • Apache Airflow
  • Apache Iceberg
  • MinIO / S3
  • Great Expectations
  • dbt

Project 03

AI-Assisted SQL Optimizer

CLI that uses Claude to suggest Spark/Snowflake query rewrites and partition strategies. Benchmarked against a 50-query corpus.

LLM tooling · SQL parsing · Benchmark methodology

  • Python
  • Typer
  • Anthropic SDK
  • sqlglot
  • pytest