Hi 👋, I'm
Mrityunjay
Pathak

About

Hi 👋, I'm Mrityunjay Pathak

I'm a Data Scientist with a knack for uncovering patterns and trends that drive smarter decisions.

Skills

Python
NumPy
Pandas
Matplotlib
Seaborn
Plotly
Scikit-learn
MySQL
Power BI
FastAPI
Docker
Git

Projects

CAR PRICE PREDICTION


Problem

⊳ In the used car market, buyers and sellers often struggle to determine a fair price for their vehicle.

⊳ This project aims to provide accurate and transparent pricing for used cars by analyzing real-world data.

⊳ It will assist both buyers and sellers make data-driven decisions and ensure fair transactions.


Solution

To address this problem, I built and deployed a complete end-to-end machine learning pipeline :

⊳ Data Collection

→ Scraped a dataset of 2,800+ cars from the web using Selenium and BeautifulSoup.

⊳ Data Optimization

→ Optimized memory consumption of dataset by downcasting data types.

→ Stored the dataset in Parquet format, which compresses data without losing information.

→ It also provides much faster read/write speeds compared to CSV.

⊳ Preprocessing & Modeling

→ Implemented Scikit-learn Pipelines & ColumnTransformer to prevent data leakage.

⊳ API Deployment

→ Deployed the machine learning model as an API using FastAPI, with :

→ /predict endpoint for real-time predictions.

→ /health endpoint for monitoring API status.

→ Input validation & rate limiting for reliability.

⊳ Frontend Integration

→ Designed a HTML/CSS/JS website to send API calls and display predictions in a user-friendly way.

⊳ Containerization

→ Created a multi-stage Dockerfile with .dockerignore for building an optimized and lightweight Docker image.


Impact

⊳ Built and deployed a complete machine learning pipeline as a FastAPI application.

⊳ Reduced dataset memory usage by 90% by downcasting data types and converting to Parquet format.

⊳ Evaluated multiple regression models with cross-validation to identify the best-performing algorithm.

⊳ Achieved 30% lower MAE and 12% higher R2-score compared to the baseline model.

⊳ Improved model stability by 70%, ensuring more stable and reliable predictions.

Car Price Prediction

Movie Recommender System


Problem

⊳ With the rise of streaming services, viewers now have access to thousands of movies across platforms.

⊳ As a result, many viewers spend more time browsing than actually watching.

⊳ This problem can lead to frustration, lower satisfaction and less time spent on the platform.

⊳ Which can impact both the user experience and business performance.


Solution

⊳ A content-based movie recommender system built with clean and modular code with proper version control.

⊳ It analyzes metadata of 5000+ movies to recommend top 5 similar titles based on a user selected input.

⊳ The system uses techniques like CountVectorizer and CosineSimilarity to recommend similar movies.

⊳ The project not only focuses on functionality but on building a clean and scalable solution.


Impact

If this system gets scaled and integrated with a streaming service, this could :

⊳ Reduce the time users spend choosing what to watch.

⊳ Increase user engagement, watch time and customer satisfaction.

⊳ Help streaming platforms retain users by offering better personalized content.

Movie Recommender System

Netflix Data Analysis


Problem Statement

⊳ To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.


Some Key Findings

Cleaned and analyzed dataset of 8000+ Netflix Movies and TV Shows.

⊳ More than 60% of content on Netflix is rated for mature audiences.

→ Suggests that Netflix targets adult viewers to boost engagement and retention.

⊳ More than 25% of Movies and TV Shows are released on 1st day of the month.

→ Shows a consistent release schedule, likely to align with subscription cycles.

⊳ More than 40% of the content on Netflix is exclusive to United States.

→ Shows a strong focus on the U.S. market and content availability by location.

⊳ More than 20% of the content on Netflix falls under the "Drama" genre.

→ Confirms that "Drama" is a key part of Netflix's content library.

⊳ More than 23% of the content on Netflix was released in 2019 alone.

→ Indicates a major content push that year, possibly tied to growth or user acquisition goals.

Netflix Data Analysis

Supermarket Sales Analysis


Problem Statement

⊳ To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.


Some Key Findings

Analyzed purchasing pattern of 9000+ customers of Supermarket.

⊳ More than 15% of the products sold were Snacks.

→ Shows that Snacks are a convenient choice and a big source of revenue.

⊳ More than 32% of the sales were occurred in West region of Supermarket.

→ Suggests that West region is a strong performing area as compared to others.

⊳ Health and Soft drinks are the most profitable category in Beverages.

→ Shows that both type of drinks option sells well.

⊳ November was the most profitable month contributing about 15% of the total annual profits.

→ Makes it an ideal time for running promotions and special offers.

Supermarket Sales Analysis

Certificates

Blogs

Simple Linear Regression
Multiple Linear Regression

Contact