Vedant Achole — Data Engineer

01 — Selected Work

Three projects. Three real decisions.

Every project on my resume has a technical decision I'd defend in an interview. These are the ones that shaped how I think about data systems.

In progress2026 — Solo · 10 weeks

BillingShield — a healthcare payment-integrity platform, built end-to-end.

Full medallion pipeline on CMS Medicare data — 10M+ provider-procedure claims flowing through Bronze → Silver → Gold Delta tables on Databricks, a PySpark + dbt transformation layer, an XGBoost fraud classifier with SHAP explanations, served through FastAPI and a Streamlit dashboard.

The decision I'd defend

Splitting train/test at the NPI level, not the row level. Row-level splits leak provider identity and inflate accuracy 10+ points. The kind of thing that looks fine in a notebook and breaks in production.

DatabricksPySparkdbtDelta LakeXGBoostSHAPFastAPIStreamlitDagster

Read on GitHub ↗

BillingShield end-to-end architecture diagram

Shipped2026 — Solo · 3 weeks

Healthcare claims analytics — AWS medallion on Parquet.

Production ELT on AWS Glue (PySpark) across four normalized claims tables. Joined data, applied window functions for provider rankings, computed derived fields, and delivered six Gold-layer KPIs in Parquet — with sub-second Athena query performance on large-scale claims.

The quiet win

Automated schema detection via Glue Crawlers cataloged 10 tables. That's the kind of plumbing nobody notices until it isn't there — and it's where most pipelines rot.

AWS GluePySparkS3AthenaParquetMedallion

Read on GitHub ↗

Case study2025 — Solo · 2 weeks

LLM-powered resume matching, demoed to CGI senior leadership.

Built an end-to-end RAG pipeline — dense vector search in FAISS plus GPT-4 re-ranking — to semantically match 3,000 candidate profiles against open roles. Live proof-of-concept to CGI senior leadership: 40% improvement in candidate relevance over keyword ATS.

What I kept

The code is archived. What I kept is the lesson that AI in enterprise is a communication problem as much as a technical one. A model that doesn't explain itself is a model nobody adopts.

OpenAI GPT-4FAISSRAGHuggingFaceSentence Transformers

Code archived · Case study only

02 — About

The long version.

I grew up in Maharashtra, moved to Pune for a BTech in Artificial Intelligence, then to Boston for an MS in Management at Questrom. Now in Lafayette, Louisiana — consulting at CGI by day, building healthcare data systems by night.

The short version of my career is that I kept following one question: where does the data actually come from, and why does it break? That led from ML research in undergrad to data quality work at KPMG, then to a management degree (because I wanted to understand why companies make the data decisions they do), and now to building data and AI platforms full-time.

I'm looking for a Data Engineering or AI Engineering role where the work is real — healthcare, fintech, anywhere the numbers matter. I care about pipelines that hold up, documentation people can actually read, and communicating technical work to non-technical audiences.

Off the clock: cricket on Saturdays, vlogs on YouTube, eggs most mornings, and ~23 hours a week of deliberate practice toward being genuinely good at this craft.

Education

MS Management

Boston University, Questrom
Director's Honors · 2025

BTech, AI & Data Science

VIIT, Pune
9.03 / 10 · 2024

Experience

CGI — Consultant

Sep 2025 — present

KPMG — Analyst

Feb 2024 — Aug 2024

Certifications

Databricks Fundamentals
AWS + Azure Essentials
HubSpot Digital Marketing

03 — Off-screen

Being a whole person.

Saturdays are for cricket. Evenings sometimes become vlogs. I think being a whole person is part of being a decent engineer.

YouTube · Life by Vedant Achole

Life by Vedant Achole

Cricket · vlogs · @vedantacholee ↗

"Started by engineers for just their backchodi and enjoyment."

Into right now

i.
F1 race data — building a telemetry-analysis side project to sit alongside BillingShield.
ii.
Cricket league in Louisiana. Saturdays are sacred — catch my practice shorts on YouTube.
iii.
Push/pull/legs split, 2,200 kcal, eggetarian and protein-obsessed.
iv.
Kimball's Data Warehouse Toolkit. Still relevant in 2026.

04 — Contact

Let's talk.

Looking for a Data Engineering or AI Engineering role where I can ship real pipelines and learn from senior engineers. Available July 2026. NYC, Boston, Seattle, or strong remote teams.

Email

vedant4815@gmail.com

Open ↗

Phone

+1 857-832-8355

Open ↗

/in/vedantachole

Open ↗

GitHub

@VedantVAchole

Résumé

Download PDF ↓

Data & AI engineerbuilding systems that ship and survive_

Three projects. Three real decisions.

BillingShield — a healthcare payment-integrity platform, built end-to-end.

Healthcare claims analytics — AWS medallion on Parquet.

LLM-powered resume matching, demoed to CGI senior leadership.

The long version.

Being a whole person.

Let's talk.

Data & AI engineer
building systems that ship and survive_