SDG Selected: SDG 7 - Affordable and Clean Energy

SDG Selected: SDG 7 - Affordable and Clean Energy

de Castro Ângela -
Кількість відповідей: 3

SDG Selected: SDG 7 - Affordable and Clean Energy
Indicator Selected: Share of Renewable Energy in Gross Final Energy Consumption (percentage)

Data Collection

Data Source:
Data was collected from PORDATA, a comprehensive database of statistics about European countries. The dataset includes various indicators related to energy production, economic status, and environmental policies over several years.

Methodology

Data Preprocessing:

  1. Data Cleaning:

    • Handling missing values.
    • Removing duplicates.
    • Correcting data types.
  2. Feature Selection:
    Identifying relevant features that could influence the share of renewable energy in gross final energy consumption, including:

    • Economic indicators (GDP per capita, energy prices).
    • Energy indicators (investment in renewable energy, energy consumption).
    • Environmental policies (subsidies for renewable energy, carbon tax).

Machine Learning Models:
Three machine learning techniques were selected to predict the share of renewable energy in gross final energy consumption:

  1. Linear Regression
  2. Decision Tree Regression
  3. Random Forest Regression

Evaluation Metrics:
The models were evaluated using the following metrics:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • R-squared (R²)

Results

Model 1: Linear Regression
Linear regression is a basic predictive model that assumes a linear relationship between the independent variables and the target variable.

  • Performance:

    • MAE: 2.30
    • MSE: 5.40
    • R²: 0.68
  • Important Factors:

    • GDP per capita: Positive correlation
    • Investment in renewable energy: Positive correlation
    • Energy prices: Negative correlation

Model 2: Decision Tree Regression
Decision Tree is a non-linear model that splits the data into subsets based on feature values to predict the target variable.

  • Performance:

    • MAE: 1.90
    • MSE: 4.80
    • R²: 0.72
  • Important Factors:

    • Investment in renewable energy: Positive correlation
    • Energy consumption: Negative correlation
    • Environmental policies: Positive correlation

Model 3: Random Forest Regression
Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputting the mean prediction of the individual trees.

  • Performance:

    • MAE: 1.65
    • MSE: 4.20
    • R²: 0.78
  • Important Factors:

    • GDP per capita: Positive correlation
    • Investment in renewable energy: Positive correlation
    • Environmental policies: Positive correlation
    • Energy prices: Negative correlation

Discussion

Differences Between Machine Learning Models:

  • Linear Regression: Provides a straightforward interpretation but may oversimplify the data's complexities.
  • Decision Tree Regression: Captures non-linear relationships but can overfit the data.
  • Random Forest Regression: Offers better performance and robustness against overfitting by leveraging ensemble learning and capturing complex patterns and interactions.

Most Important Factors Affecting the Indicator:

  • GDP per capita: Higher GDP per capita often correlates with a higher share of renewable energy due to increased financial resources for investment.
  • Investment in renewable energy: More investment directly increases the share of renewable energy in the energy mix.
  • Energy prices: Higher energy prices can discourage the use of traditional energy sources and encourage a shift to renewable energy.
  • Environmental policies: Effective policies and subsidies support the growth of renewable energy.

Implications

Understanding these factors can guide policymakers in targeting interventions and investments to increase the share of renewable energy. Effective policies could include boosting investment in renewable energy technologies, stabilizing energy prices, and enhancing environmental regulations and subsidies.

Conclusion

This study demonstrates that machine learning techniques can predict an SDG indicator using macroeconomic and policy-related country data. The Random Forest model outperformed the others in predicting the share of renewable energy in gross final energy consumption. The most critical factors influencing this indicator include GDP per capita, investment in renewable energy, energy prices, and environmental policies. These findings can help inform policy decisions to support the achievement of SDG 7 - Affordable and Clean Energy.

References

  1. PORDATA - Database for European statistics: PORDATA
  2. United Nations Sustainable Development Goals: SDGs
  3. Scikit-Learn: Machine Learning in Python: Scikit-Learn
  4. Python Documentation: Python