Jahid Hasan

I'm a Data Scientist

About Me

Hi, I'm Jahid Hasan. As an experienced Data Scientist and ML Engineer with 6+ years of professional experience, I have a strong background in building machine learning models, scalable data pipelines, and production-ready ML systems. I am a Kaggle Grandmaster, placing me in the top 0.1% of data scientists globally, reaching that level in just 6 months by publishing over 20 highly ranked notebooks covering credit risk, healthcare, and clean energy. Currently completing my MS in Data Science at Eastern University with a 4.00 GPA, expected May 2026.

At Ludwig Pfeiffer, I built automated reporting pipelines that cut generation time by 60%, deployed production-grade REST APIs for ML models using Django REST Framework, and designed computer vision systems for automated pipeline inspection. I also built database management systems handling 100K+ records in MySQL and PostgreSQL and implemented Microsoft Power Apps and Power Automate solutions that improved process efficiency by 40%.

At TAPPWARE Solutions, I built e-governance data pipelines processing 500K+ records daily at 99.9% accuracy and deployed personalized recommendation models achieving 85%+ accuracy for a government e-learning platform. I worked with large-scale datasets, applied NLP and word embedding techniques, and built scalable REST APIs to serve model predictions in real time using Django REST Framework and PostgreSQL.

At Qtec Solutions, I developed and fine-tuned sentiment analysis models achieving 82% accuracy for marketing strategy optimization, built automated web scraping pipelines using Python and BeautifulSoup, and designed data warehousing solutions using dimensional modeling and ETL processes. I applied classification, regression, clustering, and ensemble methods to generate business insights from large-scale datasets.

Beyond my professional work, I published peer-reviewed research at CLEF 2025 on cross-lingual subjectivity detection using multilingual transformers. I write on Medium with 13 published articles on Python and data visualization, maintain an open source R and ggplot2 theme on CRAN with 1,200+ downloads, and am writing Data Science Mastery: From Fundamentals to Professional Practice, a 45 chapter book covering statistics, probability, machine learning, and real world applications.

Current Role Data Science Contributer
Company Kaggle
Education MS in Data Science
Kaggle Status Grandmaster
Location Maryland, US

Personal Interests

Software Development

Machine Learning

Database Design and Architecture

Large Language Model

Statistical Analysis

Data Science

Problem-Solving Techniques

Generative AI

Research Interests

I'm in for both research and development. Currently doing my graduate thesis work on Big Data Mining, Digital Image Processing, and Artificial Intelligence. I've listed some other topics even though they are out of my league. I hope to work on these in the future.

Artificial Neural Network (ANN)

Recurrent Neural Network (RNN)
Convolutional Neural Network (CNN)
Neural Network Optimization
LSTM (Long Short Term Memory) Network

Computer Vision & Digital Image Processing

Facial and Emotion Recognition
Blob Detection

Digital Signal Processing & Cognitive Science

EEG & EMG Analysis
Speech Recognition
Medical Imaging
Computer Graphics

Natural Language Processing

Native Natural Language Processing Toolkit
Text-based Emotion Analysis
News Analysis using NLP

Deep Learning

Deep Learning using Theano, TensorFlow, and Torch

Big Data Mining and Cloud Computing

Distributed Data Processing
Scalable Machine Learning Algorithms
Cloud-based Analytics

Internet of Things (IoT)

IoT Security and Privacy
Smart Systems and Automation
Sensor Networks and Edge Computing

Education

Eastern University

MS in Data Science

January 2024 - Present In Progress
Relevant Coursework
Introduction to Statistical Modeling Data Analytics in R Data Manipulation Applied Machine Learning Natural Language Processing

Southeast University

B.Sc. in Computer Science & Engineering

September 2013 - December 2017 Completed
Relevant Coursework
Database Design Artificial Intelligence Statistical Methods & Probability Image Processing Data Mining

Publications

SmolLab SEU at CheckThat! 2025: How well do multilingual transformers transfer across news domains for cross-lingual subjectivity detection

September 26, 2025 | CLEF - Conference and Labs of the Evaluation Forum

Research on multilingual transformer models and their cross-domain transfer capabilities for detecting subjectivity in news articles across different languages. This work explores the effectiveness of modern NLP architectures in handling cross-lingual subjectivity detection tasks.

Natural Language Processing Multilingual Transformers Cross-lingual Analysis

Professional Experience

Ludwig Pfeiffer Hoch- und Tiefbau GmbH & Co. KG

Data Scientist
January 2021 - December 2023
  • Developed and maintained robust database management systems handling 100K+ records using MySQL and PostgreSQL, ensuring data integrity, optimized query performance, and high availability across production environments.
  • Created and maintained automated reporting pipelines using Python (Pandas), SQL, and Power BI, reducing report generation time by 60% and significantly improving operational visibility for business stakeholders.
  • Built and deployed production-grade REST APIs using Django REST Framework, serving ML model predictions with low-latency inference and integrating seamlessly with frontend and enterprise systems.
  • Implemented Microsoft Power Apps and Power Automate solutions to streamline and automate business workflows across departments, improving overall process efficiency by 40%.
  • Designed and implemented computer vision and ML models for automated pipeline inspection, reducing manual inspection overhead and improving defect detection accuracy across construction site operations.
  • Developed and deployed Google AppSheet applications for construction area management, store inventory tracking, invoice processing, and workflow automation across field operations.
  • Maintained business relationships with Dhaka WASA and Bangladesh Water Development Board, translating technical findings into actionable reports for non-technical stakeholders.
Skills
Data Science & Analytics
Python Pandas Machine Learning Deep Learning
Data Visualization
Tableau Power BI Excel Google Sheets
Web Development & Backend
Node.js Express.js Django REST Framework GraphQL
Database Management
MongoDB PostgreSQL
System & Cloud
Linux Digital Ocean

Tappware Solutions Limited

Software Engineer
September 2019 - December 2020
  • Designed scalable ETL data pipelines for e-governance platforms, processing 500K+ records daily with 99.9% accuracy, ensuring reliable and uninterrupted data flow across government and enterprise systems.
  • Built and deployed scalable machine learning web APIs using Django REST Framework and PostgreSQL, supporting real-time inference for government and enterprise applications across multiple production environments.
  • Built personalized recommendation models using TensorFlow and Scikit-learn, achieving 85%+ accuracy in predicting learner preferences and improving e-learning engagement across government platforms.
  • Conducted advanced data analysis on Japanese patient datasets using statistical modeling and NLP techniques, extracting meaningful patterns to support data-driven clinical decisions and medical research.
  • Proposed, investigated, and deployed new analytic capabilities for e-governance solutions, selecting appropriate algorithms and analytics tools for large-scale government data processing requirements.
  • Worked extensively with large-scale datasets using big data techniques, word embeddings, and advanced querying to extract insights from complex, unstructured government and enterprise data sources.
Skills
Programming & Tools
Python Flask Linux
Data Science & Analytics
NumPy Pandas TensorFlow Scikit-learn
Data Visualization
Seaborn Matplotlib
Cloud & Database
PostgreSQL SQLite Heroku

Qtec Solution Limited

Junior Software Engineer
January 2018 - August 2019
  • Applied a broad range of machine learning techniques including classification, regression, clustering, and ensemble methods to generate actionable business insights from large-scale datasets across multiple industries.
  • Developed and fine-tuned sentiment analysis models for marketing strategy optimization, achieving up to 82% accuracy and delivering actionable insights to support campaign decision-making and customer engagement.
  • Built automated web scraping solutions and end-to-end data preprocessing pipelines using Python and BeautifulSoup to collect, clean, and analyze market trends and customer behavior at scale.
  • Designed and implemented data warehousing solutions using dimensional modeling and ETL processes, enabling efficient storage, retrieval, and reporting across large-scale business datasets for strategic analysis.
  • Performed predictive analytics using statistical modeling and data mining techniques to develop targeted marketing strategies and improve customer segmentation accuracy.
  • Analyzed and processed sophisticated datasets using MySQL and Excel, delivering structured reports and visualizations to support business intelligence and executive decision-making.
Skills
Data Science
Machine Learning Deep Learning Sentiment Analysis
Big Data
Hadoop Apache Spark ETL
Web & APIs
FastAPI Scrapy BeautifulSoup Selenium

Grameen Intel Social Business Limited

Software QA Intern
August 2017 - December 2017
  • Designed and executed comprehensive SQA test plans, test cases, and test scripts to ensure software quality and reliability across multiple application modules and development cycles.
  • Tracked, documented, and managed bugs and defects using Jira and TestLink, collaborating closely with developers to verify fixes and ensure timely resolution before production releases.
  • Performed thorough cross-platform testing across Windows, Linux, and macOS environments, ensuring consistent application behavior and performance across all supported operating systems.
  • Collaborated with cross-functional development teams to identify edge cases, validate software requirements, and contribute to continuous quality improvement throughout the development lifecycle.
Skills
SQA & Testing
Software Testing Test Planning Bug Tracking Mobile Testing
Tools
Jira TestLink

Projects

Netflix Movie Analysis

Netflix Movie Analysis

View Details
Music Sales Analysis

Music Sales Analysis

View Details
AI Survey Analysis

AI Survey Analysis

View Details
EV Charging Station Analysis

EV Charging Station Analysis

View Details
Smartphone Data Insights

Smartphone Data Insights

View Details
Global AI Salary Dive

Global AI Salary Dive

View Details
Dhaka Urban Population

Dhaka Urban Population

View Details
Quality of Life Index 2024

Quality of Life Index 2024

View Details
Walmart Sales Analysis

Walmart Sales Analysis

View Details
Bangladesh Road Accidents

Bangladesh Road Accidents

View Details
Student Information System

Student Information System

View Details
Diabetes Risk Analysis

Diabetes Risk Analysis

View Details
Loan Approval Analysis

Loan Approval Patterns

View Details
Iris Classification Analysis

Iris: Modeling, Prediction

View Details
Laptop Price Analysis

Exploring Laptop Price

View Details
LoanEase - Loan Prediction App

LoanEase - Loan Prediction App

View Details

DASH VIEW

Region Wise Sales Dashboard

Region Wise Sales Dashboard

Interactive Sales Analytics Dashboard

Real-time visualization of regional sales performance with interactive filtering and drill-down capabilities

Power BI Excel Business Intelligence

DYNAMIC REPORTS

Customer Insights and Trends Analysis

Behavioral patterns, segmentation, and trend discovery using advanced data visualization techniques.

Python Seaborn

Customer Profitability and Marketing Analysis

Identifying high-value customers and evaluating marketing effectiveness through data-driven analysis.

R Markdown

Exploratory Data Analysis of Netflix Movies

A Hands-On Approach in R

R

Canada Immigration Insights

Visualizing Key Trends and Data

Python

SCRIPT VISION

Pandas, PostgreSQL, SQLAlchemy, Pgcli

PostgreSQL, Create table, insert values

Watch these interactive terminal recordings to see real-time demonstrations of database operations and Python scripting

View More on Asciinema

Open Source Packages

Online Certifications

Machine Learning

Machine Learning

View Certificate
Applied Machine Learning in Python

Applied Machine Learning in Python

View Certificate
Natural Language Processing in TensorFlow

Natural Language Processing in TensorFlow

View Certificate
Python 101 for Data Science

Python 101 for Data Science

View Certificate
Python Core

Python Core

View Certificate
SQL for Data Science

SQL for Data Science

View Certificate
Data Science Math Skills

Data Science Math Skills

View Certificate
Learn The Linux Command Line

Learn The Linux Command Line: Basic Commands

View Certificate
Data Science

Data Science

View Certificate
Automate the Boring Stuff with Python

Automate the Boring Stuff with Python Programming

View Certificate
Intro to Programming

Intro to Programming

View Certificate
Introduction to Data Science

Introduction to Data Science

View Certificate
Crash Course on Python

Crash Course on Python

View Certificate
Convolutional Neural Networks in TensorFlow

Convolutional Neural Networks in TensorFlow

View Certificate
Python for Beginners

Python for Beginners: Complete Python Programming

View Certificate
Machine Learning

Machine Learning

View Certificate
Python (Basic)

Python (Basic)

View Certificate
Introduction to Data Analytics

Introduction to Data Analytics

View Certificate

Technical Skills

Programming Languages

Python R SQL Java C++ MATLAB JavaScript

Database

MySQL PostgreSQL MongoDB Oracle SQLite SQLAlchemy

Data Science Libraries & Tools

Pandas NumPy SciPy Statsmodels

Machine Learning & Deep Learning Frameworks

Scikit-learn XGBoost LightGBM CatBoost TensorFlow PyTorch Keras OpenCV

Natural Language Processing (NLP)

NLTK SpaCy Hugging Face Transformers Word Embeddings

Data Visualization & BI

Matplotlib Seaborn Plotly ggplot2 PowerBi Tableau Streamlit

Statistical Analysis & Mathematics

Hypothesis Testing A/B Testing Regression Analysis Time Series Experimental Design Linear Algebra

Big Data & Cloud

Apache Spark AWS S3 EC2 Docker Kubernetes ETL Data Warehousing

Web Development

HTML5 CSS3 JavaScript Bootstrap Django Flask FastAPI Node.js Express.js Django REST BeautifulSoup

DevOps & Deployment

Docker Heroku Netlify CI/CD Feature Engineering Model Deployment MLOps

Tools & Platforms

Git GitHub Jupyter Linux VS Code Postman Swagger

Microsoft Power Platform

Power BI Power Apps Power Automate SharePoint

Methodologies

Agile Scrum CI/CD Feature Engineering Model Deployment MLOps

Scripting & Automation

Bash Scrapy Selenium

IDEs & Text Editors

Neovim Jupyter VS Code

Design & Prototyping

Figma GIMP

Markup Languages & Documentation

Markdown LaTeX JSON

Security & Operating Systems

Ubuntu Kali Linux Linux

Other Skills

LibreOffice JWT Pgcli bpython

Blog

Technical Blog

Explore my technical blog featuring tutorials, data science insights, machine learning projects, and software engineering best practices.

Visit Blog

Get In Touch

Learn With Me

Explore my tutorials, technical content, and educational resources across multiple platforms

Send Me a Message