Himanshu Pant

Himanshu Pant

AWS & Microsoft Certified Data Engineer

I build things that scale.

|

Data Engineer specializing in large-scale ETL/ELT pipelines, real-time data infrastructure, and AI/ML systems. Built production platforms processing 100M-1B records with 66-86% performance improvements across insurance, sports analytics, and healthcare domains.

Open to opportunitiesTempe, AZhimanshupant.dev

Engineer at heart.

Data Engineer with expertise in large-scale data infrastructure, real-time processing, and AI/ML systems. I've migrated 1B+ records to distributed architectures, eliminated 90-minute production bottlenecks, and built real-time pipelines handling 500K+ daily events with 99.9% accuracy.

Track record of rapid impact across insurance, sports analytics, and healthcare — promoted within 6 months at Super Six and received SPOT Recognition Award for delivering critical data infrastructure under tight deadlines.

🏥 Healthcare Sports Analytics🏦 Insurance & Banking📊 Retention & Segmentation
100M-1B
Records Processed
M.S.
Software Engineering (AI Specialization) — Arizona State University

Tech I work with

Languages
PythonSQLJavaScriptPySpark
Data & ML
Apache SparkDatabricksAirflowLangChainSnowflakeGreat Expectations
Cloud & Infra
AWS (EMR, S3, Lambda, SQS, ECR)AzureDockerJenkinsCI/CD
Frontend
ReactNext.jsNode.jsHTML/CSS
Databases
PostgreSQLSQL ServerOracleNoSQLVector DBs
Tools
GitPower BIStreamlitJira

Highlights.

🏆

DEVHACKS 2026 — 1st Place

Won Track 1 with MeetFlow — intelligent task orchestration converting meeting transcripts into capacity-aware ticket assignments using LLM-powered analysis, competing against 100+ teams.

☁️

AWS Certified Data Engineer Associate

Passed DEA-C01 (May 2026) — validated expertise in data pipeline design, ETL optimization, AWS Glue/EMR/Redshift, and implementing data quality frameworks at scale.

📊

Microsoft Certified: Fabric Analytics Engineer

Passed DP-700 (May 2026) — validated expertise in Microsoft Fabric analytics engineering, data warehousing, and cloud data solutions.

🏆

HackASU — FairCharge

Built a medical bill audit pipeline at HackASU that uses Claude Vision + SapBERT to detect overcharges, flagging $1,300+ in average billing errors per hospital bill.

🏅

SPOT Award — Exceptional Delivery

Recognized at Super Six Sports Gaming for exceptional delivery and cross-team collaboration on critical product features.


Industry Credentials.

AWS Certified Data Engineer – Associate badge

AWS Certified Data Engineer – Associate

Amazon Web ServicesDEA-C01

Validated expertise in designing, building, and maintaining data pipelines using AWS services including Glue, EMR, Redshift, Kinesis, and implementing data quality frameworks at scale.

Microsoft Certified: Fabric Analytics Engineer Associate badge

Microsoft Certified: Fabric Analytics Engineer Associate

MicrosoftDP-700

Certified in Microsoft Fabric analytics engineering, data warehousing, data modeling, and implementing end-to-end analytics solutions on Azure cloud platform.


Where I've worked.

Data & AI Engineer (Industry Capstone)

MyEdMaster

Jan 2026 – Apr 2026

Tempe, AZ

  • Developed multi-agent stateful RAG system for personalized legal guidance using LangGraph and Qdrant vector DB, achieving sub-2s query latency across 10K+ documents and 95% relevance accuracy
  • Built FastAPI + Node.js backend with Docker supporting 100+ concurrent sessions at sub-200ms response time via optimized retrieval and a 5-node pipeline
  • Authored evaluation framework for the QnA service and resolved critical production NameError in the agentic graph execution layer; shipped 4-layer technical documentation adopted by partner team
PythonLangGraphQdrantFastAPINode.jsDocker

Consultant II, Data Analytics & Engineering

EXL Services

Jul 2023 – Mar 2024

Gurugram, India

  • Eliminated a 90-minute production bottleneck by optimizing PySpark ETL on AWS Glue / MWAA processing 100M+ records per batch, cutting runtimes 66% via partition pruning
  • Architected S3 data lake and Snowflake warehouse across 8 datasets for 500K+ daily users on high-frequency insurance claim events, driving 40% BI query performance improvement
  • Built production data quality framework using Great Expectations across 15+ data sources with CloudWatch monitoring, maintaining 100% SLA compliance
  • Automated infrastructure provisioning by deploying Airflow DAGs via Terraform IaC and integrating Jenkins CI/CD with unit and integration testing
PySparkAWS GlueMWAASnowflakeS3Great ExpectationsCloudWatchTerraformJenkins

Data Engineer

Super Six Sports Gaming

Aug 2022 – Jul 2023

Gurugram, India

  • Built and owned the core data ingestion layer: multi-source batch ETL from 10+ REST APIs and S3, loading 500K+ daily records of sports event data into MongoDB with 99.9% accuracy
  • Reduced user churn 20% by building a production ML retention pipeline (78% accuracy, Scikit-Learn) detecting behavioral anomalies in time-series user activity and triggering automated Python/SQL workflows
  • Engineered ML feature pipelines with SCD Type 1/2 dimensional modeling, accelerating ML experiment cycle time by 35%
PythonSQLPySparkMongoDBS3REST APIsScikit-Learn

Associate Data Engineer

Futurense Technologies

Oct 2021 – Jul 2022

Bangalore, India

  • Spearheaded migration of 1B+ record oncology pipelines from legacy SAS to Apache Spark on Azure Databricks, cutting batch time from 6+ hours to 50 minutes (86% reduction) via broadcast joins
  • Automated bi-weekly HCP targeting reports via Python ETL on AWS Athena, reducing manual effort from 4-5 hours to 15 minutes (95% savings)
  • Delivered 99.5% data accuracy post-migration via comprehensive PySpark + SQL validation frameworks with statistical reconciliation
Apache SparkAzure DatabricksPySparkPythonSQLAWS Athena

Data Analyst

Koron Projects Limited

Oct 2018 – Jul 2021

New Delhi, India

  • Built 15 Power BI dashboards with advanced analytics for executive leadership tracking project costs, timelines, and profitability across $50M+ in annual construction projects
  • Consolidated cost data from SQL Server, Oracle, and MySQL via SQL and stored procedures, removing 20 hours/month of manual effort
Power BISQL ServerOracleMySQLSQL

Selected work.


Let's build something.

Open to Data Engineer, ML Engineer, and Backend Engineer roles focused on large-scale data infrastructure, real-time systems, and AI/ML platforms. Willing to relocate anywhere in the US for the right opportunity.