🎯 Project Overview
This project focuses on analyzing a dataset of laptops to understand trends and patterns in laptop pricing. By examining attributes such as brand, processor speed, RAM size, storage capacity, screen size, and weight, the analysis aims to uncover insights into how these features influence laptop prices.
The project combines thorough exploratory data analysis with KNN regression and hyperparameter tuning to predict laptop prices and identify which hardware specifications are the dominant pricing drivers.
📊 Dataset Description
The dataset comprises 1000 rows and 7 columns, each representing different attributes related to laptops.
| # | Column | Description |
|---|---|---|
| 1 | Brand | Manufacturer or brand name (Asus, Acer, Lenovo, HP, Dell) |
| 2 | Processor_Speed | Clock speed of the processor in GHz |
| 3 | RAM_Size | Amount of RAM installed in GB |
| 4 | Storage_Capacity | Total storage capacity in GB |
| 5 | Screen_Size | Diagonal display size in inches |
| 6 | Weight | Physical weight of the laptop in kilograms |
| 7 | Price | Retail price of the laptop |
Data Quality
- Missing Values: The dataset contains no missing values.
- Duplicates: The dataset contains no duplicate values.
- RangeIndex: The dataset includes 1000 entries.
- Data Types: 4 float columns, 2 integer columns, and 1 object column.
📈 Exploratory Data Analysis
4.1 | Individual Variables Analysis
Brand Wordcloud
The five brands are nearly evenly distributed, with Dell slightly leading at 21% and HP and Lenovo tied at the lowest at 19%.
Brand Distribution
Bar chart confirming the near-equal spread of all five brands across the 1000 laptop records.
Histogram — All Features
Processor speed shows a relatively flat spread across 1.5–4.0 GHz, while RAM and storage cluster at discrete tiers.
KDE — All Features
KDE curves reveal that RAM size and storage capacity have clear multimodal patterns, confirming that values come in fixed tiers rather than as a continuous range.
4.2 | Outlier Identification
Boxplots — All Features
Box plots show that processor speed, screen size, weight, and price have no outliers, while RAM and storage show visible spread consistent with their discrete tier structure.
4.3 | Pairs of Variables Insights
Scatter vs Price
Scatter plots of each feature against price reveal that price exists at three distinct tiers (~10k, ~18k, ~32k), and no single continuous feature drives a smooth price gradient.
Box and Scatter
Box and scatter plots paired per feature confirm these discrete price bands hold consistently regardless of processor speed, screen size, or weight.
4.4 | Multiple Variables Examination
Correlation Heatmap
Storage capacity has a near-perfect correlation with price (1.00), making it the dominant pricing factor. All other features have correlations close to zero.
Scatter with Regression Lines
Storage capacity traces a clean linear relationship with price, while all other features show horizontal bands with flat regression slopes.
Pairplot — All Features by Brand
The pairplot confirms that brand has no meaningful impact on how features distribute or relate to each other.
3D Scatter — Price, Processor Speed, RAM Size
A 3D scatter visualizes the discrete pricing tiers in three dimensions — the horizontal layering confirms that neither processor speed nor RAM alone determines price tier.
Relplot — RAM vs Price by Storage Capacity
Price tier is entirely determined by storage (256 → ~10k, 512 → ~18k, 1000 → ~32k), regardless of RAM or brand.
4.5 | Hypothesis Testing with ANOVA
ANOVA was used to test whether mean prices differ significantly across brands. Results confirm that brand is not a statistically significant predictor of price in this dataset.
🤖 Model Development & Evaluation
5.1 | Data Normalization
Feature values are normalized prior to model training.
5.2 | Feature Encoding
The Brand categorical column is encoded for use in regression models.
5.3 | Model Training
KNN regression is trained across a range of K values to identify the optimal hyperparameter.
5.4 | Model Evaluation
R² scores are computed for both training and test sets at each value of K.
5.5 | Hyperparameter Tuning
R² Score vs K — KNN Regression
The model peaks around K=3–5 and plateaus with near-perfect scores above 0.9995 across most values of K.
5.6 | Best Model Result
KNN regression achieved an R² score exceeding 0.9995, indicating that storage capacity alone is nearly sufficient to predict price with very high accuracy.
5.7 | Interactive Model Testing
An interactive widget allows users to input laptop specifications and receive a predicted price from the trained model in real time.
🎉 Key Insights
- Storage capacity is the sole pricing driver — it correlates with price at ~1.00, while all other features have near-zero correlations.
- Price exists at three discrete tiers: ~10k for 256 GB, ~18k for 512 GB, and ~32k for 1000 GB storage, regardless of any other specification.
- Brand has no significant impact on price — ANOVA confirms no statistically significant price difference across the five brands (Asus, Acer, Lenovo, HP, Dell).
- RAM and storage show multimodal KDE patterns, confirming they come in fixed tiers rather than as a continuous range — which explains the discrete pricing structure.
- KNN regression achieved an R² score above 0.9995, making it a near-perfect predictor of laptop price given the storage-dominated pricing structure.
- Processor speed, screen size, and weight show no outliers and no meaningful correlation with price in this dataset.