Hi 👋, I'm
Mrityunjay
Pathak
Python
NumPy
Pandas
Matplotlib
Seaborn
Plotly
Scikit-learn
MySQL
Power BI
Excel
FastAPI
Docker
Git
GitHub
Actions
Bash
Problem
→ In the used car market, buyers and sellers often struggle to determine a fair price for their vehicles.
→ Incorrect pricing can result in lost revenue if undervalued or delayed sales if overpriced.
→ This goal is to provide accurate and transparent pricing for used cars by analyzing real-world market
listings.
Solution
→ Built and deployed an end-to-end machine learning pipeline to predict used car prices using real-world
data.
→ Collected and cleaned 2,800+ used car listings from CARS24 using Selenium and BeautifulSoup.
→ Optimized dataset memory usage by 90% through downcasting data types and converting to Parquet format.
→ Trained regression models using Scikit-learn Pipelines to prevent data leakage and ensure reliable
evaluation.
→ Deployed the trained machine learning model as a REST API using FastAPI on Render.
→ Built an HTML/CSS/JS frontend hosted on GitHub Pages to interact with the REST API and display
predictions.
→ Containerized the entire application using Docker and pushed to Docker Hub for reproducibility.
Impact
→ Achieved a 30% lower MAE and a 12% higher R2 score compared to the baseline regression model.
→ Reduced prediction error variance by 70%, ensuring more stable and reliable predictions.
Result
→ Helps sellers price vehicles closer to true market value, reducing revenue loss from underpricing.
→ Helps buyers make confident purchase decisions by identifying fairly priced listings.
→ Increases revenue by aligning vehicle prices with current market value, reducing underpricing or
overpricing risks.
Problem
→ Quick Buy is a leading superstore operating across the United States.
→ It manages thousands of product transactions daily across multiple regions.
→ The store's operations relied heavily on manual spreadsheets and SQL queries to track business
performance.
→ As a result, decision-making was slowed down, making it harder to identify growth opportunities.
Solution
→ Designed a fully automated ETL pipeline using Python, SQLAlchemy and GitHub Actions for daily data
updates.
→ Built Python ETL scripts to extract, transform and load 50,000+ sales records into a Neon PostgreSQL
database.
→ Simulated ~100 new transactions daily to replicate ongoing business activity and test pipeline
reliability.
→ Integrated Power BI with the database, enabling a real-time, auto-refreshing dashboard without manual
updates.
Impact
→ Improved daily data update time from hours to under a minute (average ~45 sec) using GitHub Actions.
→ Reduced reporting time by 80% through automation, enabling faster tracking of revenue and profit
performance.
→ Achieved 100% workflow reliability with zero pipeline failures since deployment (as recorded in GitHub
Actions).
Key Insights
→ Standard Class drives ~60% of sales (₹5.1M) and profit (₹897K), making it the most profitable shipping
mode.
→ Consumer Segment generates ~50% of revenue (~₹4.26M) and profit (~₹757K), our primary customer base.
→ Q4 (Oct-Dec) contributes ~27% of annual revenue, indicating strong seasonal demand, ideal for promotions.
→ Paper, Binders, and Phones are the top-performing sub-categories, together making up ~45% of total
revenue.
→ West and East regions lead with ~58% of total sales, while the South with ~19% shows strong growth
potential.
→ Top 5 States (CA, NY, TX, PA, OH) generates ~54% of total sales, with CA alone contributing ~21% of sales.
Problem
→ With the rise of streaming services, users now have access to thousands of movies across platforms.
→ As a result, many viewers spend more time browsing than watching content.
→ This leads to frustration, lower satisfaction, and reduced watch time on the platform.
→ Over time, this impacts both user retention and platform engagement.
Solution
→ Built a content-based movie recommender system trained on 5,000+ movie metadata records.
→ Generated the top 5 similar titles for any selected movie in under 3 seconds using cosine similarity.
→ Integrated the TMDB API to dynamically fetch and display movie posters, improving the user experience.
→ Deployed the system as a Streamlit web app, enabling users to explore personalized movie suggestions.
Impact
→ Reduced browsing time by instantly suggesting the top 5 most similar movies for any selected title.
→ Delivered movie recommendations in under 3 seconds, ensuring a fast and smooth user experience.
→ Improved content engagement by guiding users toward relevant titles instead of manual browsing.
→ Served 100+ users through a deployed web app, turning a notebook model into a live recommendation system.