
AWS & Microsoft Certified Data Engineer
Data Engineer specializing in large-scale ETL/ELT pipelines, real-time data infrastructure, and AI/ML systems. Built production platforms processing 100M-1B records with 66-86% performance improvements across insurance, sports analytics, and healthcare domains.
About
Data Engineer with expertise in large-scale data infrastructure, real-time processing, and AI/ML systems. I've migrated 1B+ records to distributed architectures, eliminated 90-minute production bottlenecks, and built real-time pipelines handling 500K+ daily events with 99.9% accuracy.
Track record of rapid impact across insurance, sports analytics, and healthcare — promoted within 6 months at Super Six and received SPOT Recognition Award for delivering critical data infrastructure under tight deadlines.
Achievements
Won Track 1 with MeetFlow — intelligent task orchestration converting meeting transcripts into capacity-aware ticket assignments using LLM-powered analysis, competing against 100+ teams.
Passed DEA-C01 (May 2026) — validated expertise in data pipeline design, ETL optimization, AWS Glue/EMR/Redshift, and implementing data quality frameworks at scale.
Passed DP-700 (May 2026) — validated expertise in Microsoft Fabric analytics engineering, data warehousing, and cloud data solutions.
Built a medical bill audit pipeline at HackASU that uses Claude Vision + SapBERT to detect overcharges, flagging $1,300+ in average billing errors per hospital bill.
Recognized at Super Six Sports Gaming for exceptional delivery and cross-team collaboration on critical product features.
Certifications
Experience
MyEdMaster
Tempe, AZ
EXL Services
Gurugram, India
Super Six Sports Gaming
Gurugram, India
Futurense Technologies
Bangalore, India
Koron Projects Limited
New Delhi, India
Projects
Solo builder — HackASU Claude AI Builder Hackathon
A medical bill audit pipeline that reads your bill, identifies every charge, benchmarks against real CMS Medicare pricing data for your state, detects overcharges and billing violations, and generates a ready-to-send dispute letter. Built to fight the information asymmetry where 49–80% of medical bills contain errors.
Led the LLM pipeline & orchestration layer
Intelligent task orchestration that converts meeting transcripts into actionable tickets. Analyzes via GPT-4o-mini, checks team capacity through Taiga, recommends smart reassignment for overloaded members, and notifies via Slack.
A serverless, multistage face recognition system using edge computing. IoT clients send video frames processed through decoupled detection and recognition stages via event-driven architecture — scalable, real-time identification without persistent servers.
Full-stack productivity platform with Google OAuth, Focus Score tracking, daily task management, Pomodoro timer, and motivational micro-challenges.
A developer wellness app with mixable ambient soundscapes, 10+ casual mini-games, and a calming UI — because even builders need a break.
Multi-agent system providing accessible legal guidance for those who can't afford a lawyer. Extensive query refinement via vector DB, specialized agents that collaborate independently.
Centralized Feature Store using Apache Iceberg with time-travel, enabling ML teams to train, test, and deploy from a single source of truth.
Migrated 50 legacy SAS workflows to Spark, cutting runtime from 6+ hours to under 50 minutes via partitioning, broadcast joins, and caching.
Get in Touch
Open to Data Engineer, ML Engineer, and Backend Engineer roles focused on large-scale data infrastructure, real-time systems, and AI/ML platforms. Willing to relocate anywhere in the US for the right opportunity.