Himanshu Pant

Himanshu Pant

I build things
that scale.

Data Engineer specializing in large-scale ETL/ELT pipelines, real-time data infrastructure, and AI/ML systems. Built production platforms processing 100M-1B records with 66-86% performance improvements across insurance, sports analytics, and healthcare domains.

Open to opportunitiesTempe, AZhimanshupant.dev

Engineer at heart.

Data Engineer with expertise in large-scale data infrastructure, real-time processing, and AI/ML systems. I've migrated 1B+ records to distributed architectures, eliminated 90-minute production bottlenecks, and built real-time pipelines handling 500K+ daily events with 99.9% accuracy.

Track record of rapid impact across insurance, sports analytics, and healthcare โ€” promoted within 6 months at Super Six and received SPOT Recognition Award for delivering critical data infrastructure under tight deadlines.

๐Ÿฅ Healthcareโšฝ Sports Analytics๐Ÿฆ Insurance & Banking๐Ÿ“Š Retention & Segmentation
100M-1B
Records Processed
M.S.
Software Engineering, Data Science Minor โ€” ASU

Tech I work with

Languages
PythonSQLJavaScriptPySpark
Data & ML
Apache SparkDatabricksAirflowLangChainSnowflakeGreat Expectations
Cloud & Infra
AWS (EMR, S3, Lambda, SQS, ECR)AzureDockerJenkinsCI/CD
Frontend
ReactNext.jsNode.jsHTML/CSS
Databases
PostgreSQLSQL ServerOracleNoSQLVector DBs
Tools
GitPower BIStreamlitJira

Highlights.

๐Ÿ†

DEVHACKS 2026 โ€” 1st Place

Won Track 1 with MeetFlow โ€” intelligent task orchestration converting meeting transcripts into capacity-aware ticket assignments using LLM-powered analysis, competing against 100+ teams.

๐Ÿ“œ

Microsoft Certified: Fabric Analytics Engineer

Passed DP-700 (May 2026) โ€” validated expertise in Microsoft Fabric analytics engineering, data warehousing, and cloud data solutions.

๐Ÿ†

HackASU โ€” FairCharge

Built a medical bill audit pipeline at HackASU that uses Claude Vision + SapBERT to detect overcharges, flagging $1,300+ in average billing errors per hospital bill.

๐Ÿ…

SPOT Award โ€” Exceptional Delivery

Recognized at Super Six Sports Gaming for exceptional delivery and cross-team collaboration on critical product features.

๐ŸŽ“

M.S. Software Engineering

Arizona State University โ€” Data Science Minor โ€” 2026. Focus on AI/ML infrastructure, distributed systems, and agentic architectures.


Where I've worked.

AI Engineer

MyEdMaster

US-based EdTech Company (Virginia)

Jan 2026 โ€“ Apr 2026 Legal Tech / AI

Remote

  • Built multi-agent RAG system for personalized legal guidance โ€” created LangGraph orchestration pipeline with Qdrant vector DB, achieving sub-2s query latency across 10K+ legal documents
  • Designed data ingestion and transformation pipelines converting unstructured legal corpora into structured, queryable vector embeddings with real-time personalization signals via LLM APIs, achieving 95% relevance accuracy
  • Built FastAPI + Node.js backend integrating vector DB lookups, session state management, and multi-turn conversation history, supporting 100+ concurrent sessions with <200ms response time
PythonLangChainVector DBMulti-Agent SystemsLLMRAG

Consultant II โ€” Analytics (Data Engineer)

EXL

Jul 2023 โ€“ Mar 2024 Insurance

Gurugram, India ยท Remote

  • Cut PySpark ETL processing time by 66% (90 min โ†’ 30 min) on AWS EMR for insurance reporting stakeholders
  • Built automated data validation with Great Expectations across 5 insurance data sources, catching quality issues before downstream models
  • Orchestrated end-to-end Airflow workflows integrating APIs, databases, and S3 for underwriting ML models
  • Implemented CI/CD pipelines (Jenkins + Bitbucket) to streamline releases and reduce deployment friction
PySparkAWS EMRAirflowSnowflakeGreat ExpectationsJenkins

Data Engineer

Super Six Sports Gaming

Aug 2022 โ€“ Jul 2023 Sports Analytics

Gurugram, India ยท On-site

  • Designed and built the entire data infrastructure from scratch โ€” data models, ingestion pipelines, and processing layers โ€” serving as the foundation for all analytics and ML workloads
  • Developed and deployed a churn prediction model for new users, directly reducing churn and driving measurable improvements in retention and revenue
  • Built end-to-end pipelines (Python, PySpark, SQL) ingesting semi-structured data from APIs, logs, and NoSQL sources at scale
  • Automated data validation, monitoring, and unit testing frameworks, improving quality and reliability across the full pipeline
PythonPySparkSQLNoSQLMachine LearningData Modeling

Associate Data Engineer

Futurense Technologies

Oct 2021 โ€“ Jul 2022 Healthcare

Bangalore, India ยท Remote

  • Led migration of 50 legacy SAS batch workflows to Spark on Azure Databricks for financial services clients
  • Cut runtime from 6 hours to under 50 minutes (86% reduction) via partitioning, caching, and broadcast joins
  • Delivered zero-data-loss transition across all 50 workflows, resolving schema mismatches and validation gaps
  • Built automated data quality checks and transformation pipelines for downstream ML and reporting
Azure DatabricksApache SparkPySparkSASSQLS3

Executive Analyst

Koron Projects Limited

Oct 2018 โ€“ Jul 2021 Construction & Infrastructure

Gurugram, India ยท On-site

  • Built 15 Power BI dashboards providing real-time visibility across $50M+ in annual construction projects
  • Automated monthly reporting, replacing manual Excel workflows and saving 20 hours/month for finance stakeholders
  • Designed KPI dashboards tracking 50+ active projects, enabling leadership to identify delays and make faster decisions
Power BISQL ServerOraclePythonExcel

Selected work.


Let's build something.

Open to Data Engineer, ML Engineer, and Backend Engineer roles focused on large-scale data infrastructure, real-time systems, and AI/ML platforms. Willing to relocate anywhere in the US for the right opportunity.

Download Resume