SDG Affordable and clean energy: Goal 7 - Population unable to keep home adequately warm (%)

SDG Affordable and clean energy: Goal 7 - Population unable to keep home adequately warm (%)

por Miguel Pinto -
Número de respostas: 2

Introduction

This report focuses on predicting one indicator from SDG 7 (Affordable and Clean Energy) using three machine learning (ML) techniques. Specifically, the selected indicator is the "Percentage of Population Unable to Adequately Heat Their Homes."

Data Collection

Data was collected from PORDATA encompassing the following macroeconomic indicators that could influence the inability to adequately heat homes:

  • Average equivalent income by household type (Euro);
  • Final energy consumption per capita by households;
  • Housing cost overburden rate.

Machine Learning Techniques

The following three machine learning techniques were employed to predict the percentage of the population unable to adequately heat their homes:

  1. Linear Regression
    • Chosen for its simplicity and interpretability in understanding linear relationships between variables.
  2. Random Forest
    • Selected for its ability to capture non-linear relationships and interactions among variables.
  3. Neural Networks
    • Used to model complex, non-linear relationships but requires careful tuning and is less interpretable.

Data Preprocessing

Data preprocessing involved handling missing values, normalizing numerical features, and encoding categorical variables if applicable. The dataset was split into training and testing sets in a 70:30 ratio for model evaluation.

Exploratory Data Analysis (EDA)

EDA provided the following insights:

  • Average equivalent income by household type significantly influences the ability to heat homes adequately.
  • Final energy consumption per capita varies between countries, reflecting different energy efficiency patterns and access to resources.
  • Housing cost overburden rate indicates financial pressures affecting investments in energy efficiency.

Machine Learning Models

  1. Linear Regression

    • Results:
      • Mean Squared Error (MSE): 46.23
    • Important Factors: Determined by coefficients assigned to each variable.
  1. Random Forest

    • Results:
      • Mean Squared Error (MSE): 45.46
      • Feature Importance:
        • Average Equivalent Income: 1689.84
        • Final Energy Consumption: 1797.06
        • Housing Cost Overburden Rate: 1170.68
  2. Neural Networks

    • Results without Normalization:
      • Mean Squared Error (MSE): 158.48
    • Results with Normalization:
      • Mean Squared Error (MSE): 48.45
    • Extended Tuning:
      • Mean Squared Error (MSE): 54.86

Discussion

Differences Between Models

  • Linear Regression: Useful for identifying straightforward linear relationships but may overlook complexities.
  • Random Forest: Effective in capturing non-linear interactions but may be less straightforward to interpret.
  • Neural Networks: Capable of modeling complex relationships but requires careful parameter tuning and lacks direct interpretability.

Key Factors Affecting Inability to Heat Homes Adequately

  • Average Equivalent Income: Critical for targeting infrastructure investments in heating solutions.
  • Final Energy Consumption per Capita: Reflects energy efficiency and access to resources.
  • Housing Cost Overburden Rate: Indicates financial constraints impacting energy efficiency investments.

Conclusion

Based on the results, the Random Forest model provided the best performance for predicting the percentage of the population unable to adequately heat their homes, with the lowest Mean Squared Error (MSE) of 45.46. The Linear Regression model also performed reasonably well and is useful due to its simplicity and interpretability. Although Neural Networks showed potential, they did not outperform the other models in this specific context.

Recommendations

  1. Improve Heating Infrastructure: Focus on areas with lower average equivalent income to ensure adequate heating.
  2. Implement Energy Conservation Measures: Target areas with high housing cost overburden rates to reduce per capita energy consumption.
  3. Develop Sustainable Energy Policies: Consider regional capacities and financial constraints to enhance energy management and efficiency.

References

  • PORDATA - European statistical database
  • United Nations Sustainable Development Goals (SDGs)

This report provides a comprehensive analysis of the models and their results, guiding future policy and investment decisions to improve sustainable energy access.


Em resposta a 'Miguel Pinto'

Re: SDG Affordable and clean energy: Goal 7 - Population unable to keep home adequately warm (%)

por João Filipe Moreira Veríssimo -
Your report on predicting the "Percentage of Population Unable to Adequately Heat Their Homes" is clear and well-structured, particularly in its rationale for selecting machine learning techniques. However, specifying the method for handling missing values and the years of data collection would enhance clarity and completeness.