Canada Immigration Insights
TOC Image

Index

%load_ext pretty_jupyter

Libraries


Load required libraries and set up the environment

# Import necessary packages
import warnings  # Used to suppress warnings in the notebook output
warnings.filterwarnings("ignore")  # Ignore any warnings that might be shown during execution

import pandas as pd  # Import pandas for data manipulation and analysis
import numpy as np   # NumPy for numerical operations and array handling
import matplotlib.pyplot as plt  # Import matplotlib for plotting graphs and visualizations

# Load the Rosé Pine style for Matplotlib
plt.style.use('./matplot-design/rosepine.mplstyle')  # Apply the custom style from the specified file path
# This ensures that all subsequent plots will follow the design aesthetics of the 'rosepine' style

Data Loading and Inspection


Error Handling for Loading a Dataset Using Pandas read_csv

# Error Handling When Loading Dataset with Pandas read_csv

try:
    # Attempt to read the dataset
    df = pd.read_csv('../data/canada_immigrant.csv')
    print("Dataset loaded successfully.")
    
except FileNotFoundError:
    # Handle FileNotFoundError if the file does not exist
    print("Error: File not found. Please check the file path.")

except Exception as e:
    # Handle other exceptions
    print(f"An error occurred while loading the dataset: {e}")

# Just a simple print statement for now (without color)
print("Process finished.")
Dataset loaded successfully.
Process finished.

Explore the Data

This process involves inspecting the first few rows, the last five rows, and a random sample of rows from a dataset.

Data view

First Five Rows

df.head(5)
Canada - Admissions of Permanent Residents by Province/Territory of Intended Destination and Immigration Category, January 2015 - June 2024 Province/Territory and Immigration Category 2015 Q1 Jan 2015 Q1 Feb 2015 Q1 Mar 2015 Q1 Total 2015 Q2 Apr 2015 Q2 May 2015 Q2 Jun 2015 Q2 Total 2015 Q3 Jul ... 2023 Total 2024 Q1 Jan 2024 Q1 Feb 2024 Q1 Mar 2024 Q1 Total 2024 Q2 Apr 2024 Q2 May 2024 Q2 Jun 2024 Q2 Total 2024 Total
0 Agri-Food Pilot 0 0 0 0 0 0 0 0 0 ... 0 0 -- 0 -- 0 0 0 0 --
1 Atlantic Immigration Pilot Programs 0 0 0 0 0 0 0 0 0 ... 125 -- 0 5 10 -- 0 -- -- 15
2 Atlantic Immigration Programs 0 0 0 0 0 0 0 0 0 ... 700 90 70 50 210 105 95 110 315 525
3 Canadian Experience -- 0 -- 5 -- -- -- -- -- ... 110 20 -- -- 25 5 -- -- 10 40
4 Caregiver 0 -- 0 -- 0 -- 5 5 0 ... -- -- 0 0 -- -- 0 0 -- --

5 rows × 163 columns

Last Five Rows

df.tail(5)
Canada - Admissions of Permanent Residents by Province/Territory of Intended Destination and Immigration Category, January 2015 - June 2024 Province/Territory and Immigration Category 2015 Q1 Jan 2015 Q1 Feb 2015 Q1 Mar 2015 Q1 Total 2015 Q2 Apr 2015 Q2 May 2015 Q2 Jun 2015 Q2 Total 2015 Q3 Jul ... 2023 Total 2024 Q1 Jan 2024 Q1 Feb 2024 Q1 Mar 2024 Q1 Total 2024 Q2 Apr 2024 Q2 May 2024 Q2 Jun 2024 Q2 Total 2024 Total
366 Resettled Refugee & Protected Person in Canada... 0 0 -- -- 0 0 0 0 0 ... 10 -- 0 0 -- 0 -- 0 -- --
367 All Other Immigration - Total 0 0 0 0 0 0 0 0 0 ... -- 0 0 0 0 0 0 0 0 0
368 Nunavut - Total -- 5 5 10 0 -- 10 10 -- ... 55 -- -- -- 10 10 5 -- 20 25
369 Province/Territory not stated - Total 5 -- 0 10 0 -- 5 10 10 ... 30 0 -- 0 -- 5 -- 5 15 20
370 Total 12,910 16,440 21,770 51,125 21,165 23,895 27,025 72,090 27,770 ... 471,815 47,760 39,105 34,870 121,730 42,595 46,835 44,540 133,970 255,705

5 rows × 163 columns

Sample Five Rows

df.sample(5)
Canada - Admissions of Permanent Residents by Province/Territory of Intended Destination and Immigration Category, January 2015 - June 2024 Province/Territory and Immigration Category 2015 Q1 Jan 2015 Q1 Feb 2015 Q1 Mar 2015 Q1 Total 2015 Q2 Apr 2015 Q2 May 2015 Q2 Jun 2015 Q2 Total 2015 Q3 Jul ... 2023 Total 2024 Q1 Jan 2024 Q1 Feb 2024 Q1 Mar 2024 Q1 Total 2024 Q2 Apr 2024 Q2 May 2024 Q2 Jun 2024 Q2 Total 2024 Total
93 Atlantic Immigration Programs 0 0 0 0 0 0 0 0 0 ... 915 420 250 200 870 330 390 235 955 1,825
244 Worker Program 780 1,245 1,370 3,395 1,545 1,695 2,140 5,375 2,400 ... 6,910 815 765 690 2,270 985 950 890 2,825 5,095
252 Temporary Resident to Permanent Resident Pathway 0 0 0 0 0 0 0 0 0 ... 2,760 185 70 70 330 40 50 20 110 435
365 Protected Person in Canada 0 0 -- -- 0 0 0 0 0 ... -- -- 0 0 -- 0 -- 0 -- --
45 Economic - Total 35 55 70 160 70 100 85 255 115 ... 3,280 500 410 340 1,250 470 330 285 1,090 2,340

5 rows × 163 columns

Data Cleaning and Transformation


Data Cleaning and Transformation for Time Series Analysis: Renaming, Filtering, and Handling Missing Values

# Data Cleaning: Renaming Columns and Removing Irrelevant Rows

# Rename the column to a more accessible name for easy access
data_clean = df.rename(columns={
    'Canada - Admissions of Permanent Residents by Province/Territory of Intended Destination and Immigration Category, January 2015 - June 2024 Province/Territory and Immigration Category': 'Immigration_Category'
})

# Remove rows where the 'Immigration_Category' is not relevant (such as summary rows or totals)
data_clean = data_clean[data_clean['Immigration_Category'] != 'Total']

# Data Transformation for Time Series Analysis:
# Replace '--' with NaN (which are considered missing values)
# Then fill missing values (NaN) with 0 for simplicity
data_clean.replace('--', 0, inplace=True)

# Convert all the numeric columns (starting from the second column) to numeric types
# If there are any errors during conversion (e.g., non-numeric values), they are turned into NaN and then filled with 0
for col in data_clean.columns[1:]:  # Skipping the first column which is categorical
    data_clean[col] = pd.to_numeric(data_clean[col], errors='coerce').fillna(0)

Summarizing Immigration


Summarizing Admissions of Permanent Residents by Quarter: Aggregation and Cleaning for Visualization

# Step 1: Prepare a list of columns containing 'Total' to focus on quarter totals
quarter_columns = [col for col in data_clean.columns if 'Total' in col]

# Step 2: Select the columns of interest: Immigration Category and the quarter columns
data_quarter_totals = data_clean[['Immigration_Category'] + quarter_columns]

# Step 3: Replace '--' or other non-numeric values with NaN for correct summation
data_quarter_totals.replace('--', np.nan, inplace=True)

# Step 4: Convert all the quarter columns to numeric, coercing errors to NaN
data_quarter_totals[quarter_columns] = data_quarter_totals[quarter_columns].apply(pd.to_numeric, errors='coerce')

# Step 5: Fill any NaN values (e.g., caused by '--') with 0
data_quarter_totals.fillna(0, inplace=True)

# Step 6: Summing total admissions across all immigration categories for each quarter
quarter_totals = data_quarter_totals.drop(columns='Immigration_Category').sum()

# Step 7: Clean the index labels by removing the ' Total' text from the column names
quarter_totals.index = quarter_totals.index.str.replace(' Total', '')  # Removing 'Total' from the labels

Data Visualization of Immigration


Figure 1: Total Admissions of Permanent Residents to Canada (2015 - 2024)

# Extract quarter labels (years and quarters) and clean up by removing ' Total' text
quarter_labels = quarter_totals.index.str.replace(' Total', '')

# Create a plot to visualize the total admissions over time (from 2015 to 2024)
plt.figure(figsize=(12, 6))  # Set the figure size
plt.plot(quarter_labels, quarter_totals.values, marker='o')  # Plot the time series with markers
plt.title('Total Admissions of Permanent Residents to Canada (2015 - 2024)', fontsize=14)  # Set the plot title
plt.xlabel('Quarter', fontsize=12)  # Label for the x-axis
plt.ylabel('Total Admissions', fontsize=12)  # Label for the y-axis
plt.grid(True)  # Display a grid for better readability
plt.xticks(rotation=90)  # Rotate the x-axis labels for better readability
plt.tight_layout()  # Adjust layout for better spacing
plt.show()  # Display the plot

This visualizes represents the total admissions of permanent residents to Canada from 2015 to 2024, showing data points for each quarter. The labels on the x-axis correspond to the year and quarter (e.g., "2015 Q1", "2015 Q2"), and the y-axis shows the total admissions.

Figure 2: Quarterly and Annual Total Admissions of Permanent Residents to Canada (2015 - 2024)

# Extract relevant years from the quarter_totals index
years = [label.split()[0] for label in quarter_totals.index]  # Extract the year from each label
quarters = [label.split()[1] if len(label.split()) > 1 else 'Annual' for label in quarter_totals.index]  # Extract quarters or label as 'Annual' for the yearly total

# Create a DataFrame for better visualization
quarter_totals_df = pd.DataFrame({
    'Year': years,
    'Quarter': quarters,
    'Total Admissions': quarter_totals.values
})

# Pivot data to create a DataFrame for each year's quarter admissions
pivot_data = quarter_totals_df.pivot(index='Quarter', columns='Year', values='Total Admissions')

# Plot the time series data for each year with different colors
plt.figure(figsize=(14, 8))
for year in pivot_data.columns:
    plt.plot(pivot_data.index, pivot_data[year], marker='o', label=year)  # Plot each year as a line

# Set plot details
plt.title('Total Admissions of Permanent Residents to Canada (2015 - 2024)', fontsize=16)
plt.xlabel('Quarter', fontsize=12)
plt.ylabel('Total Admissions', fontsize=12)
plt.grid(True)
plt.xticks(rotation=45)  # Rotate the x-axis labels for readability
plt.legend(title='Year', bbox_to_anchor=(1.05, 1), loc='upper left')  # Add legend outside the plot for clarity
plt.tight_layout()  # Adjust layout to ensure the plot fits within the figure

# Show the plot
plt.show()

This line plot visualizes the total admissions of permanent residents to Canada over the years from 2015 to 2024. The x-axis represents quarters (Q1, Q2, Q3, Q4) along with yearly totals, while the y-axis shows the total admissions. Each year is color-coded with a distinct line to help differentiate the trends over time. The plot includes legible axis labels and a title to provide clear context and make the data easy to interpret.

Figure 3: Total Admissions of Permanent Residents to Canada by Year (2015 - 2024)

# Define the columns for total admissions for each year (assumed to contain 'Total')
yearly_columns = [col for col in data_clean.columns if 'Total' in col]
# Assuming 'yearly_columns' are the correct columns with total admissions per year
data_year_totals = data_clean[['Immigration_Category'] + yearly_columns].drop(columns='Immigration_Category').sum()

# Plotting the bar plot
plt.figure(figsize=(10, 6))
plt.bar(data_year_totals.index, data_year_totals.values, color='teal')
plt.title('Total Admissions of Permanent Residents by Year')
plt.xlabel('Year')
plt.ylabel('Total Admissions')
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

This bar plot displays the total admissions of permanent residents to Canada for each year from 2015 to 2024, highlighting yearly trends in immigration. The x-axis represents the years, while the y-axis shows the total number of admissions.

Figure 4: Immigration Growth Trends: Year-over-Year Comparison by Category

# Extract relevant columns
yearly_columns = [col for col in data_clean.columns if 'Total' in col]

# Prepare a DataFrame with total admissions for each immigration category per year
immigration_by_category = data_clean[['Immigration_Category'] + yearly_columns]

# Summing total immigration for each year by category
category_totals = immigration_by_category.set_index('Immigration_Category').sum(axis=1)

# To calculate growth, we need to compare the values year by year. We'll extract yearly totals and calculate the percentage change.
# Creating a growth calculation function
def calculate_growth(df, columns):
    # Create an empty DataFrame to store growth rates
    growth_rates = pd.DataFrame(index=df.index)
    
    # Loop through each pair of consecutive years and calculate growth
    for i in range(1, len(columns)):
        current_year = columns[i]
        previous_year = columns[i-1]
        growth_column = f"Growth {current_year.split()[0]} vs {previous_year.split()[0]}"
        growth_rates[growth_column] = (df[current_year] - df[previous_year]) / df[previous_year] * 100
        
    return growth_rates

# Applying the function to calculate growth between consecutive years
growth_data = calculate_growth(immigration_by_category, yearly_columns)

# Re-define the necessary variables since they were lost in the previous step
growth_columns_to_plot = growth_data.columns[:5]  # First 5 growth columns
top_categories = growth_data.index[:5]  # Top 5 categories for illustration

# Subset the growth data for these top categories
growth_subset = growth_data.loc[top_categories, growth_columns_to_plot]

# Now plot the growth for these categories
growth_subset.T.plot(kind='bar', figsize=(12, 6), width=0.8)

# Customize the plot with titles and labels
plt.title('Growth in Immigration by Category (Selected Categories)')
plt.xlabel('Year Comparisons')
plt.ylabel('Growth (%)')
plt.xticks(rotation=45)
plt.legend(title='Immigration Category', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.grid(True)
plt.tight_layout()
plt.show()

The bars represent the percentage growth in immigration for specific categories between consecutive years. The top categories are shown for simplicity, but the full dataset can be plotted similarly to analyze trends over time.

Figure 5: Total Admissions of Permanent Residents by Province/Territory (2015 - 2024)

# Updating the filter to match the exact format 'Province name - Total'
province_keywords_formatted = [
    'Ontario - Total', 'British Columbia - Total', 'Quebec - Total', 'Alberta - Total', 'Manitoba - Total', 
    'Saskatchewan - Total', 'Nova Scotia - Total', 'New Brunswick - Total', 'Newfoundland and Labrador - Total', 
    'Prince Edward Island - Total', 'Yukon - Total', 'Northwest Territories - Total', 'Nunavut - Total'
]

year_keywords_formatted = [
    '2015 Total', '2016 Total', '2017 Total', '2018 Total', '2019 Total', '2020 Total', 
    '2021 Total', '2022 Total', '2023 Total', '2024 Total', 
    'Immigration_Category', 
]

# Filter data to match the exact province names with ' - Total'
province_data_formatted = data_clean[data_clean[year_keywords_formatted].isin(province_keywords_formatted)]

# Summing the total admissions for each province formatted with ' - Total'
province_totals_formatted = province_data_formatted.set_index('Immigration_Category').sum(axis=1)

# Plotting total admissions by province using the exact format
plt.figure(figsize=(12, 6))
province_totals_formatted.sort_values(ascending=False).plot(kind='bar')

plt.title('Total Admissions of Permanent Residents by Province/Territory (2015-2024) - Exact Format')
plt.xlabel('Province/Territory')
plt.ylabel('Total Admissions')
plt.xticks(rotation=45)
plt.grid(True)
plt.tight_layout()
plt.show()
  • province_data_formatted = data_clean[province_keywords_formatted] filters the data based on the exact province names.
  • Then, province_totals_formatted computes the sum of total admissions for each province across all years.

A work by Jahid Hasan