← Back to Insight-ML

Exploring Laptop Price Trends

Exploring Laptop Price Trends
📅2024
💻1000 Laptop Records
🤖KNN Regression
Python Pandas Matplotlib Seaborn Scikit-learn Machine Learning Regression EDA
📂 View Code on GitHub 🚀 View Live on Kaggle

🎯 Project Overview

This project focuses on analyzing a dataset of laptops to understand trends and patterns in laptop pricing. By examining attributes such as brand, processor speed, RAM size, storage capacity, screen size, and weight, the analysis aims to uncover insights into how these features influence laptop prices.

The project combines thorough exploratory data analysis with KNN regression and hyperparameter tuning to predict laptop prices and identify which hardware specifications are the dominant pricing drivers.

📊 Dataset Description

The dataset comprises 1000 rows and 7 columns, each representing different attributes related to laptops.

# Column Description
1 Brand Manufacturer or brand name (Asus, Acer, Lenovo, HP, Dell)
2 Processor_Speed Clock speed of the processor in GHz
3 RAM_Size Amount of RAM installed in GB
4 Storage_Capacity Total storage capacity in GB
5 Screen_Size Diagonal display size in inches
6 Weight Physical weight of the laptop in kilograms
7 Price Retail price of the laptop

Data Quality

📈 Exploratory Data Analysis

4.1 | Individual Variables Analysis

Brand Wordcloud

Brand Wordcloud

The five brands are nearly evenly distributed, with Dell slightly leading at 21% and HP and Lenovo tied at the lowest at 19%.

Brand Distribution

Brand Distribution

Bar chart confirming the near-equal spread of all five brands across the 1000 laptop records.

Histogram All Features

Histogram — All Features

Processor speed shows a relatively flat spread across 1.5–4.0 GHz, while RAM and storage cluster at discrete tiers.

KDE All Features

KDE — All Features

KDE curves reveal that RAM size and storage capacity have clear multimodal patterns, confirming that values come in fixed tiers rather than as a continuous range.

4.2 | Outlier Identification

Boxplots

Boxplots — All Features

Box plots show that processor speed, screen size, weight, and price have no outliers, while RAM and storage show visible spread consistent with their discrete tier structure.

4.3 | Pairs of Variables Insights

Scatter vs Price

Scatter vs Price

Scatter plots of each feature against price reveal that price exists at three distinct tiers (~10k, ~18k, ~32k), and no single continuous feature drives a smooth price gradient.

Box and Scatter

Box and Scatter

Box and scatter plots paired per feature confirm these discrete price bands hold consistently regardless of processor speed, screen size, or weight.

4.4 | Multiple Variables Examination

Correlation Heatmap

Correlation Heatmap

Storage capacity has a near-perfect correlation with price (1.00), making it the dominant pricing factor. All other features have correlations close to zero.

Scatter with Regression

Scatter with Regression Lines

Storage capacity traces a clean linear relationship with price, while all other features show horizontal bands with flat regression slopes.

Pairplot

Pairplot — All Features by Brand

The pairplot confirms that brand has no meaningful impact on how features distribute or relate to each other.

3D Scatter

3D Scatter — Price, Processor Speed, RAM Size

A 3D scatter visualizes the discrete pricing tiers in three dimensions — the horizontal layering confirms that neither processor speed nor RAM alone determines price tier.

Relplot RAM vs Price by Storage

Relplot — RAM vs Price by Storage Capacity

Price tier is entirely determined by storage (256 → ~10k, 512 → ~18k, 1000 → ~32k), regardless of RAM or brand.

4.5 | Hypothesis Testing with ANOVA

ANOVA was used to test whether mean prices differ significantly across brands. Results confirm that brand is not a statistically significant predictor of price in this dataset.

🤖 Model Development & Evaluation

5.1 | Data Normalization

Feature values are normalized prior to model training.

5.2 | Feature Encoding

The Brand categorical column is encoded for use in regression models.

5.3 | Model Training

KNN regression is trained across a range of K values to identify the optimal hyperparameter.

5.4 | Model Evaluation

R² scores are computed for both training and test sets at each value of K.

5.5 | Hyperparameter Tuning

R² Score vs K KNN

R² Score vs K — KNN Regression

The model peaks around K=3–5 and plateaus with near-perfect scores above 0.9995 across most values of K.

5.6 | Best Model Result

KNN regression achieved an R² score exceeding 0.9995, indicating that storage capacity alone is nearly sufficient to predict price with very high accuracy.

5.7 | Interactive Model Testing

An interactive widget allows users to input laptop specifications and receive a predicted price from the trained model in real time.

🎉 Key Insights