
Himanshu Pant
Data Engineer specializing in large-scale ETL/ELT pipelines, real-time data infrastructure, and AI/ML systems. Built production platforms processing 100M-1B records with 66-86% performance improvements across insurance, sports analytics, and healthcare domains.
About
Data Engineer with expertise in large-scale data infrastructure, real-time processing, and AI/ML systems. I've migrated 1B+ records to distributed architectures, eliminated 90-minute production bottlenecks, and built real-time pipelines handling 500K+ daily events with 99.9% accuracy.
Track record of rapid impact across insurance, sports analytics, and healthcare โ promoted within 6 months at Super Six and received SPOT Recognition Award for delivering critical data infrastructure under tight deadlines.
Achievements
Won Track 1 with MeetFlow โ intelligent task orchestration converting meeting transcripts into capacity-aware ticket assignments using LLM-powered analysis, competing against 100+ teams.
Passed DP-700 (May 2026) โ validated expertise in Microsoft Fabric analytics engineering, data warehousing, and cloud data solutions.
Built a medical bill audit pipeline at HackASU that uses Claude Vision + SapBERT to detect overcharges, flagging $1,300+ in average billing errors per hospital bill.
Recognized at Super Six Sports Gaming for exceptional delivery and cross-team collaboration on critical product features.
Arizona State University โ Data Science Minor โ 2026. Focus on AI/ML infrastructure, distributed systems, and agentic architectures.
Experience
MyEdMaster
US-based EdTech Company (Virginia)
Remote
EXL
Gurugram, India ยท Remote
Super Six Sports Gaming
Gurugram, India ยท On-site
Futurense Technologies
Bangalore, India ยท Remote
Koron Projects Limited
Gurugram, India ยท On-site
Projects
Solo builder โ HackASU Claude AI Builder Hackathon
A medical bill audit pipeline that reads your bill, identifies every charge, benchmarks against real CMS Medicare pricing data for your state, detects overcharges and billing violations, and generates a ready-to-send dispute letter. Built to fight the information asymmetry where 49โ80% of medical bills contain errors.
Led the LLM pipeline & orchestration layer
Intelligent task orchestration that converts meeting transcripts into actionable tickets. Analyzes via GPT-4o-mini, checks team capacity through Taiga, recommends smart reassignment for overloaded members, and notifies via Slack.
A serverless, multistage face recognition system using edge computing. IoT clients send video frames processed through decoupled detection and recognition stages via event-driven architecture โ scalable, real-time identification without persistent servers.
Full-stack productivity platform with Google OAuth, Focus Score tracking, daily task management, Pomodoro timer, and motivational micro-challenges.
A developer wellness app with mixable ambient soundscapes, 10+ casual mini-games, and a calming UI โ because even builders need a break.
Multi-agent system providing accessible legal guidance for those who can't afford a lawyer. Extensive query refinement via vector DB, specialized agents that collaborate independently.
Centralized Feature Store using Apache Iceberg with time-travel, enabling ML teams to train, test, and deploy from a single source of truth.
Migrated 50 legacy SAS workflows to Spark, cutting runtime from 6+ hours to under 50 minutes via partitioning, broadcast joins, and caching.
Get in Touch
Open to Data Engineer, ML Engineer, and Backend Engineer roles focused on large-scale data infrastructure, real-time systems, and AI/ML platforms. Willing to relocate anywhere in the US for the right opportunity.
Download Resume