About

Hi 👋, I'm Mrityunjay Pathak

I'm a Data Scientist with a knack for uncovering patterns and trends that drive smarter decisions.

Skills

Python

NumPy

Pandas

Matplotlib

Seaborn

Plotly

Sklearn

MySQL

Power BI

Excel

Streamlit

Git

Projects

Movie Recommender System

GitHub Application

Problem

⊳ With the rise of streaming services, viewers now have access to thousands of movies across platforms.

⊳ As a result, many viewers spend more time browsing than actually watching.

⊳ This problem can lead to frustration, lower satisfaction and less time spent on the platform.

⊳ Which can impact both the user experience and business performance.

Solution

⊳ A content-based movie recommender system built with clean and modular code with proper version control.

⊳ It analyzes metadata of 5000+ movies to recommend top 5 similar titles based on a user selected input.

⊳ The system uses techniques like count_vectorizer and cosine_similarity to recommend similar movies.

⊳ The project not only focuses on functionality but on building a clean and scalable solution.

Impact

If this system gets scaled and integrated with a streaming service, this could :

⊳ Reduce the time users spend choosing what to watch.

⊳ Increase user engagement, watch time and customer satisfaction.

⊳ Help streaming platforms retain users by offering better personalized content.

Netflix Data Analysis

GitHub Notebook

Problem Statement

⊳ To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.

Some Key Findings

⊳ Cleaned and analyzed dataset of 8000+ Netflix Movies and TV Shows.

⊳ More than 60% of content on Netflix is rated for mature audiences.
→ Suggests that Netflix targets adult viewers to boost engagement and retention.

⊳ More than 25% of Movies and TV Shows are released on 1st day of the month.
→ Shows a consistent release schedule, likely to align with subscription cycles.

⊳ More than 40% of the content on Netflix is exclusive to United States.
→ Shows a strong focus on the U.S. market and content availability by location.

⊳ More than 20% of the content on Netflix falls under the "Drama" genre.
→ Confirms that "Drama" is a key part of Netflix's content library.

⊳ More than 23% of the content on Netflix was released in 2019 alone.
→ Indicates a major content push that year, possibly tied to growth or user acquisition goals.

Supermarket Sales Analysis

GitHub Notebook

Problem Statement

⊳ To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.

Some Key Findings

⊳ Analyzed purchasing pattern of 9000+ customers of Supermarket.

⊳ More than 15% of the products sold were Snacks.
→ Shows that Snacks are a convenient choice and a big source of revenue.

⊳ More than 32% of the sales were occurred in West region of Supermarket.
→ Suggests that West region is a strong performing area as compared to others.

⊳ Health and Soft drinks are the most profitable category in Beverages.
→ Shows that both type of drinks option sells well.

⊳ November was the most profitable month contributing about 15% of the total annual profits.
→ Makes it an ideal time for running promotions and special offers.

Certificates

Click Here

Blogs

Simple Linear Regression

Multiple Linear Regression