Project 01
Real-Time Fraud Signals Pipeline
Kafka -> Spark Structured Streaming -> Delta -> dbt -> Streamlit. Exactly-once processing, anomaly detection, dbt tests.
- Apache Kafka
- Spark Structured Streaming
- Delta Lake
- dbt
- Streamlit
Projects
Three end-to-end projects showcasing streaming, lakehouse, and AI-assisted data engineering. Each repo runs locally without external services. Click any card for a live, interactive walkthrough of the architecture.
Project 01
Kafka -> Spark Structured Streaming -> Delta -> dbt -> Streamlit. Exactly-once processing, anomaly detection, dbt tests.
Project 02
Medallion (Bronze/Silver/Gold) lakehouse over synthetic CDR data. Airflow + MinIO + Iceberg + Great Expectations + dbt.
-- before SELECT * FROM events WHERE date_format(ts,'yyyy-MM') = '2026-04'; -- after SELECT * FROM events WHERE ts >= TIMESTAMP '2026-04-01' AND ts < TIMESTAMP '2026-05-01';
Project 03
CLI that uses Claude to suggest Spark/Snowflake query rewrites and partition strategies. Benchmarked against a 50-query corpus.