Thanks for Visiting!

Hi 👋, I'm
Mrityunjay
Pathak

Hero Image Illustration

About

PFP Image
Hi, I'm Mrityunjay Pathak, a Data Scientist.

I build and deploy end-to-end data and machine learning systems, turning ideas into working products.

End-to-End Systems I've Built :

⊳ AutoIQ : Used Car Pricing System
→ Built an end-to-end car pricing system using FastAPI and Docker, trained on 2,800+ listings scraped from CARS24.
→ Deployed an interactive HTML/CSS/JS frontend on GitHub Pages that fetches real-time predictions from a REST API.
Tech Stack : Python, Pandas, BeautifulSoup, Selenium, Scikit-learn, FastAPI, Docker, Git

⊳ Dashly : Live Sales Dashboard
→ Designed a live sales analytics dashboard in Power BI connected to a PostgreSQL database with 50,000+ sales records.
→ Automated a daily ETL pipeline using GitHub Actions to keep the dashboard updated with the latest sales data.
Tech Stack : Python, Pandas, SQLAlchemy, PostgreSQL, Power BI, GitHub Actions, Git

⊳ Pickify : Movie Recommender System
→ Developed a content-based movie recommender system using metadata from 5,000+ movies.
→ Integrated the TMDB API to fetch and display movie posters, enhancing the personalized viewing experience.
Tech Stack : Python, Pandas, Scikit-learn, NLTK, Streamlit, TMDB API, Git

I also write about ML, MLOps, and the practical lessons I've learned while building and shipping systems into production.

I'm currently seeking opportunities as a Junior Data Scientist or Junior Machine Learning Engineer.

If you're looking for someone eager to learn, collaborate, and build real-world solutions, I'd love to connect.

Skills

Skills ImagePython
Skills ImageNumPy
Skills ImagePandas
Skills ImageMatplotlib
Skills ImageSeaborn
Skills ImagePlotly
Skills ImageScikit-learn
Skills ImageMySQL
Skills ImagePower BI
Skills ImageExcel
Skills ImageFastAPI
Skills ImageDocker
Skills ImageGit
Skills ImageGitHub Actions
Skills ImageBash

Projects

AutoIQ : Used Car Pricing System

Problem

→ In the used car market, buyers and sellers often struggle to determine a fair price for their vehicles.

→ Incorrect pricing can result in lost revenue if undervalued or delayed sales if overpriced.

→ This goal is to provide accurate and transparent pricing for used cars by analyzing real-world market listings.


Solution

→ Built and deployed an end-to-end machine learning pipeline to predict used car prices using real-world data.

→ Collected and cleaned 2,800+ used car listings from CARS24 using Selenium and BeautifulSoup.

→ Optimized dataset memory usage by 90% through downcasting data types and converting to Parquet format.

→ Trained regression models using Scikit-learn Pipelines to prevent data leakage and ensure reliable evaluation.

→ Deployed the trained machine learning model as a REST API using FastAPI on Render.

→ Built an HTML/CSS/JS frontend hosted on GitHub Pages to interact with the REST API and display predictions.

→ Containerized the entire application using Docker and pushed to Docker Hub for reproducibility.


Impact

→ Achieved a 30% lower MAE and a 12% higher R2 score compared to the baseline regression model.

→ Reduced prediction error variance by 70%, ensuring more stable and reliable predictions.


Result

→ Helps sellers price vehicles closer to true market value, reducing revenue loss from underpricing.

→ Helps buyers make confident purchase decisions by identifying fairly priced listings.

→ Increases revenue by aligning vehicle prices with current market value, reducing underpricing or overpricing risks.

AutoIQ

Dashly : Live Sales Dashboard

Problem

→ Quick Buy is a leading superstore operating across the United States.

→ It manages thousands of product transactions daily across multiple regions.

→ The store's operations relied heavily on manual spreadsheets and SQL queries to track business performance.

→ As a result, decision-making was slowed down, making it harder to identify growth opportunities.


Solution

→ Designed a fully automated ETL pipeline using Python, SQLAlchemy and GitHub Actions for daily data updates.

→ Built Python ETL scripts to extract, transform and load 50,000+ sales records into a Neon PostgreSQL database.

→ Simulated ~100 new transactions daily to replicate ongoing business activity and test pipeline reliability.

→ Integrated Power BI with the database, enabling a real-time, auto-refreshing dashboard without manual updates.


Impact

→ Improved daily data update time from hours to under a minute (average ~45 sec) using GitHub Actions.

→ Reduced reporting time by 80% through automation, enabling faster tracking of revenue and profit performance.

→ Achieved 100% workflow reliability with zero pipeline failures since deployment (as recorded in GitHub Actions).


Key Insights

→ Standard Class drives ~60% of sales (₹5.1M) and profit (₹897K), making it the most profitable shipping mode.

→ Consumer Segment generates ~50% of revenue (~₹4.26M) and profit (~₹757K), our primary customer base.

→ Q4 (Oct-Dec) contributes ~27% of annual revenue, indicating strong seasonal demand, ideal for promotions.

→ Paper, Binders, and Phones are the top-performing sub-categories, together making up ~45% of total revenue.

→ West and East regions lead with ~58% of total sales, while the South with ~19% shows strong growth potential.

→ Top 5 States (CA, NY, TX, PA, OH) generates ~54% of total sales, with CA alone contributing ~21% of sales.

Dashly

Pickify : Movie Recommender System

Problem

→ With the rise of streaming services, users now have access to thousands of movies across platforms.

→ As a result, many viewers spend more time browsing than watching content.

→ This leads to frustration, lower satisfaction, and reduced watch time on the platform.

→ Over time, this impacts both user retention and platform engagement.


Solution

→ Built a content-based movie recommender system trained on 5,000+ movie metadata records.

→ Generated the top 5 similar titles for any selected movie in under 3 seconds using cosine similarity.

→ Integrated the TMDB API to dynamically fetch and display movie posters, improving the user experience.

→ Deployed the system as a Streamlit web app, enabling users to explore personalized movie suggestions.


Impact

→ Reduced browsing time by instantly suggesting the top 5 most similar movies for any selected title.

→ Delivered movie recommendations in under 3 seconds, ensuring a fast and smooth user experience.

→ Improved content engagement by guiding users toward relevant titles instead of manual browsing.

→ Served 100+ users through a deployed web app, turning a notebook model into a live recommendation system.

Pickify

Certificates

Blogs

Simple Linear Regression
Simple Linear Regression
Multiple Linear Regression
Multiple Linear Regression

Contact