Arnav Prashant Bule

Arnav Prashant Bule

TECH ENTHUSIAST | AI/ML DEVELOPER

Driven tech enthusiast with a knack for problem-solving and a passion for turning ideas into impactful solutions.

  • Passionate about AI/ML, Cloud, and creative tech solutions
  • Blend of coding and creativity: Python + C++ or exploring new AI/ML use-cases
  • Loves building tech with impact and mentoring peers
  • Based in Pune, Maharashtra

Skills

Programming Languages:

PythonC++JavaScriptTypeScript

Data Engineering:

PySparkDelta LakeDatabricksAuto LoaderETL PipelinesMedallion Architecture

AI/ML & Data Science:

MLflowScikit-learnPandasNumPyNLPMLOpsRAGPrompt Engineering

Web Development:

ReactNext.jsNode.jsExpress.jsREST APIsTailwind CSSChrome Extensions

Cloud Computing:

IAMKubernetesVMsDBsDatabricks JobsDABsCI/CDDocker

Data & Databases:

SQLBigQueryMongoDBRedisPostgreSQLLanceDB

Tools & Platforms:

Google Gemini APIOpenAI APIsDeepSeekAWSGoogle Cloud PlatformGitHubMLflowJestOAuth 2.0

Operating Systems:

LinuxWindows

Soft Skills:

Project ManagementCommunicationTeamworkLeadershipCollaborationProblem Solving

Certifications

Databricks Certified Data Engineer Associate
Databricks Certified Data Analyst Associate
Databricks Certified Generative AI Engineer Associate
Machine Learning Specialization — Coursera
Google Project Management Professional Certificate
Google Data Analytics Professional Certificate
Networking Basics — Cisco Foundations
DSA to Development — Geeks for Geeks (ongoing)
Cloud Career Practitioner Certified — AWS & GCP

Experience

Associate Data Scientist Intern

Sept 2025 – Present

V4C.ai

  • Built and maintained end-to-end ETL pipelines using PySpark, Delta Lake, and Auto Loader, implementing Medallion Architecture (Bronze–Silver–Gold) for scalable, production-grade analytics and ML workloads.
  • Orchestrated and automated production pipelines using Databricks Jobs and Databricks Asset Bundles (DABs), with parameterization, scheduling, and environment-aware deployments aligned with CI/CD principles.
  • Integrated MLflow for experiment tracking, model versioning, and reproducibility, supporting MLOps workflows and lifecycle management in cloud-based environments.
  • Optimized data transformations, compute usage, and storage layouts to improve performance, reliability, and cost efficiency, enabling downstream BI reporting and ML-ready datasets.

Gen AI Intern — CTO Office

Sept 2025 – Mar 2026

Persistent Systems

  • Designed and developed a full-stack AI email assistant (Email Digital Twin) using a custom Chrome Extension and Node.js/Express backend to automate hyper-personalized email drafting.
  • Integrated Google Gemini (gemini-2.5-flash) to analyze users' sent emails, dynamically extracting communication patterns to build distinct, context-specific writing personas.
  • Implemented Retrieval-Augmented Generation (RAG) with LanceDB vector database, enabling semantic search over past email threads for context-aware draft generation.
  • Engineered a high-performance batch processing system using parallel fetching (10 concurrent requests), accelerating email analysis and persona generation by 5–7×.

AI Research and Development Intern

Jul 2024 – Dec 2024

IS360 Technologies

  • Developed a machine learning pipeline using Linear Discriminant Analysis (LDA) to classify EEG signals from the Auditory Oddball paradigm, detecting event-related potentials like the P300 wave.

Vice President

Jul 2022 – Apr 2024

Cloud Computing Club

  • Led a 55‑member team delivering workshops, seminars, and hands‑on cloud‑training sessions.
  • Grew a community of 600+ students, achieving 90% repeat‑engagement intent.
  • Managed logistics, marketing, speakers, and budgets for large‑scale events, ensuring flawless execution.

Management Associate

Jul 2023 – Present

CodeChef Campus Chapter

  • Managed the chapter's annual calendar and budget, aligning eight coding events per semester with academic schedules.
  • Organised & hosted monthly CodeChef challenge mirrors, boosting average participation from 120 → 350 students.
  • Led cross‑functional sub‑teams (marketing, problem‑setting, tech) and produced run‑books that cut future planning time by 40%.
  • Mentored a 10‑member junior committee through weekly stand‑ups and retrospectives, building a sustainable leadership pipeline.

Projects

Mini Task Tracker

2025

Next.js 15 · React 19 · TypeScript · Express · MongoDB · Redis

Built a full-stack task tracker with Next.js 15 (React 19) frontend and Express + TypeScript REST API, including workspace-based task organization and multiple task views (Board/List/Table/Timeline).

Implemented secure auth: email OTP verification (6-digit), JWT sessions (7-day expiry), bcrypt password hashing, and password reset via email token.

Added a Redis caching layer for task listing with 5-minute TTL, plus automatic cache invalidation on task mutations and graceful cache-bypass if Redis is down.

Built a drag-and-drop Kanban experience and persisted ordering using a batch update endpoint backed by MongoDB bulkWrite + a position sort field.

Email Digital Twin

2025

Chrome Extension · Node.js · Express · Google Gemini · LanceDB · Gmail API · OAuth 2.0

Built an AI-powered Chrome Extension and Node.js backend that learns a user's unique writing style to generate hyper-personalized, context-aware email drafts using Google Gemini and LanceDB.

Integrated Google Gemini's LLM via the API to analyze sent emails, dynamically extracting communication patterns and building distinct writing personas (professional, casual, etc.).

Implemented RAG using LanceDB vector database for semantic search over past email threads, enabling the AI to understand ongoing conversation context for accurate draft generation.

Integrated Gmail API with secure OAuth 2.0 flows within a privacy-first architecture where data processing stays local, and built an injected Gmail UI layer with keyboard shortcuts and a multi-persona editor.

Road Extraction On Satellite Images

Aug 2024 – Present

SIH Project (No. of Group Members – 6)

Developing software for automated road extraction using CNNs on ISRO's Resourcesat images from the Boonidhi portal.

Built a GUI for specifying areas of interest and generating geographically referenced shapefiles, with email alerts for road changes based on image comparisons.

Optimized for efficient processing of large satellite datasets.

ERP Exerciser

Aug 2024 – Present

Industry Project (No. of Group Members – 3)

Developing ERP Exerciser, a Brain-Computer Interface (BCI) application that leverages Event-Related Potentials (ERPs) to deliver real-time cognitive training, enhancing memory, attention, and executive functions in patients with cognitive impairments.

Prisma

Jan 2023 – Apr 2024

SY Mini Project (No. of Group Members – 3)

Developed an extremely efficient machine-learning application for upscaling and colorizing images, with a model size of just 260 MB and achieving 89% color accuracy in standardized tests (using CNNs and DinkNet).

Potential use cases include military applications, night-vision enhancement, and restoring old photographs.

Technologies and Frameworks I Work With

A comprehensive toolkit for building modern, scalable applications

Python
C++
React
Node.js
Tailwind CSS
MongoDB
MySQL
TensorFlow
OpenCV
NumPy
AWS
Google Cloud
Kubernetes
Docker
Linux
Flutter

Let's Connect and Collaborate!

As an AI and Machine Learning enthusiast, I'm always excited to explore new ideas, share knowledge, and collaborate on projects that push the boundaries of technology. I’m not looking for a role, but I’d love to connect with others who share the same passion for AI, data science, and the endless possibilities they offer.

Interested or want to get to know me a bit better?

Get in Touch

Prefer email?
You can reach out to me at arnav.bule05@gmail.com

Check Out My Resume

Get to Know More About Me

Click below to view or download my latest resume in PDF format from Google Drive.

View Resume