Available for full-time roles · 2026

Harshal
Bhambhani

Final year EEE engineer at BITS Hyderabad building production AI systems. From NLP pipelines at Standard Chartered to RAG chatbots and multilingual AI — I build things that get deployed and actually used.

90%
Audit time reduced
6+
Projects deployed
9+
LLMs integrated
2
Internships
01 — About

Engineer by training,
builder by instinct

I started in Electrical and Electronics Engineering at BITS Hyderabad, but what genuinely pulled my interest was the Computing and Intelligence minor I picked up alongside — machine learning, AI, database systems, statistics. That curiosity has driven everything since.

My first real-world exposure was a data analyst internship at DG Liger Consulting, where I built my first RAG chatbot and learned the hard gap between theory and production. Then came a Data Scientist role at Standard Chartered GBS — six months of building a production NLP pipeline that real auditors depended on every day.

I don't just build models. I build systems that get deployed, used, and measured. The feedback loop between shipping something and watching it change how people work is what drives me forward.

Outside the code: double-sport state representative (basketball and cricket), volleyball gold medalist, and former captain of logistics teams managing 20+ people at large-scale campus events.

Python LangChain FastAPI DistilBERT SBERT FAISS NLLB-200 Whisper Qwen 7B SQL Power BI Docker Kubernetes AWS S3 Pandas Scikit-Learn PaddleOCR Dataiku DSS
BITS Pilani, Hyderabad Campus
B.E. Electrical & Electronics Engineering
Minor: Computing & Intelligence (ML, AI, DB Systems)
2022 — 2026 CGPA 6.8
MDS Public School
Class XII — CBSE
2021 92.2%
St. Paul's Convent School
Class X — CBSE
2019 91.2%
Coursework Highlights
Machine Learning · Statistics & Probability for Analytics · Database Systems · Artificial Intelligence · DSA · Operating Systems · OOP · Computer Programming
02 — Experience

Where I've built things

Jul 2025
Dec 2025
Data Scientist
Standard Chartered Global Business Services · Hyderabad
  • Engineered NLP pipelines on Dataiku DSS to process 100+ page LMA agreements, cutting manual audit times by 90% for compliance teams
  • Fine-tuned DistilBERT with WeightedTrainer (20x class weight boost) achieving 98.6% recall on rare legal clauses despite 90%+ irrelevant text
  • Fine-tuned SBERT with MultipleNegativesRankingLoss for semantic search — retrieved gold-standard compliance language via cosine similarity
  • Deployed confidence thresholding (minimum correct confidence method) filtering 95%+ false positives for audit-grade precision
  • Built AutoTransFlow: multilingual PDF translation using doc-layout-yolo + NLLB-200, preserving original document layout without external APIs
Jun 2024
Jul 2024
Data Analyst Intern
DG Liger Consulting · Gurgaon
  • Built production RAG chatbot: LangChain loaders → RecursiveCharacterTextSplitter → all-MiniLM-L6-v2 embeddings → FAISS → StableLM Zephyr 3B
  • Implemented ConversationalRetrievalChain with ConversationBufferMemory for multi-turn dialogue with proprietary PDF documents
  • Integrated speech and NLP models for analytics automation; scraped and processed 500+ articles via BeautifulSoup
03 — Projects

Things I've shipped

🏦
LMA Clause Identification Tool
Production NLP pipeline on Dataiku DSS identifying legal clauses in loan agreements. Fine-tuned DistilBERT + SBERT semantic search with confidence thresholding.
DistilBERT SBERT Dataiku DSS NLP
↑ 90% reduction in audit time · 98.6% recall
🌐
AutoTransFlow
Multilingual document translation preserving original PDF layout. doc-layout-yolo detects text blocks with bounding boxes; NLLB-200 translates 200 languages locally.
NLLB-200 YOLO Computer Vision PDF AI
Layout-preserving translation · 200 languages · Zero external API calls
🤖
RAG Chatbot
Production retrieval-augmented generation chatbot. LangChain pipeline with FAISS vector database, all-MiniLM-L6-v2 embeddings, and StableLM Zephyr 3B for grounded answers.
LangChain FAISS StableLM RAG
Multi-turn dialogue · Grounded answers · PDF knowledge base
🏥
Medical Bill OCR & Fraud Detection
End-to-end datathon pipeline: OpenCV preprocessing → PaddleOCR → Qwen 7B extraction → 4 automated fraud checks (IQR, reconciliation, SHA-256 hashing, pattern analysis).
PaddleOCR Qwen 7B Fraud Detection OpenCV
90%+ extraction accuracy · 4 fraud detection checks
🎙️
Multilingual ASR System
Zero-shot Urdu speech recognition using Whisper. Audio preprocessing pipeline with pydub and librosa, IndicBERT MLM corrector for post-ASR error fixing.
Whisper IndicBERT pydub Speech AI
↓ 14% Word Error Rate · Zero-shot Urdu recognition
📈
Market Analysis Using LLMs
Financial news sentiment pipeline. BeautifulSoup scraping → two-pass BART summarisation → RoBERTa sentiment → SMOTE balancing → Flask API with Streamlit dashboard.
BART RoBERTa Flask SMOTE
500+ articles processed · Real-time sentiment API
04 — Contact

Let's connect

I'm actively looking for full-time roles in data science, AI engineering, and GenAI development starting 2026. If you're building something interesting or want to discuss the work — I'd love to talk.

Quick summary
Role
Data Scientist / AI Engineer
Location
Open to relocation
Available
May 2026
Focus
NLP · GenAI · FinTech