Introduction
The SDGs, or Sustainable Development Goals, are a set of global objectives adopted by the United Nations in 2015. They are designed to address various social, economic, and environmental challenges faced by the world today, aiming for a more sustainable future by 2030. There are 17 goals in total, covering a wide range of issues including poverty, inequality, climate change, environmental degradation, peace, and justice. Each goal has specific targets to achieve, providing a framework for governments, businesses, and civil society to work towards sustainable development on a global scale.This report focus on SDGs 8: "Decent Work and Economic Growth." Its aim is to promote sustained, inclusive, and sustainable economic growth, full and productive employment, and decent work for all.
Data Collection
For this report, we collected data from Pordata on the inflation rate, GDP per capita, and investment share spanning from 2000 to 2022.
Methodology
Data Processing
Data Cleaning: The first step in processing data is to clean it by removing any missing or inconsistent values. This ensures that the data is accurate and reliable for analysis.
Feature Selection: Feature selection is the process of selecting the most relevant features or indicators from the data that are relevant to the SDG indicator being analyzed. This step helps to reduce the dimensionality of the data and improve the performance of machine learning models.
Machine Learning Models
Machine learning models including Linear Regression, Random Forest Regression, and Support Vector Regression (SVR) were implemented using R.
Linear Regression: Linear regression is a simple and widely used machine learning model that predicts the target variable based on a linear combination of the input features.
Random Forest Regression: Random forest regression is an ensemble method that combines multiple decision trees to improve the accuracy and robustness of the model.
Support Vector Regression (SVR): SVR is a type of regression model that uses a kernel function to map the data into a higher-dimensional space where it is easier to separate the classes.
Evaluation Metrics
Evaluation metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-squared (R²) were chosen because they are standard measures widely used for evaluating regression models.
Mean Absolute Error (MAE): MAE provides the average magnitude of the errors in a set of predictions, giving a straightforward interpretation of prediction accuracy.
Mean Squared Error (MSE): MSE squares the errors, which emphasizes larger errors more than MAE. It's useful for understanding the average squared deviation between predicted and actual values.
R-squared (R²): R² measures the proportion of the variance in the target variable that is predictable from the independent variables. It ranges from 0 to 1, where higher values indicate a better fit of the model to the data.
Model I - Linear Regression
MAE: 4.240636
MSE: 23.53445
R²: 0.4651262
Model II - Random Forest Regression
MAE: 2.354522
Random Forest MSE: 7.761602
Random Forest R²: 0.8236
Model 3 - SVR
SVR MAE: 3.79348
SVR MSE: 22.45197
SVR R²: 0.4897279
Conclusion
As we can see, Random Forest Regression outperforms both Linear Regression and SVR in all evaluated metrics (MAE, MSE, R²). It shows the smallest prediction errors (MAE) and the highest proportion of variance explained (R²), indicating superior predictive capability and model fit. It provides more accurate predictions and better explains the variability in the target variable compared to Linear Regression and SVR. This model could also be improved if we had a longer dataset and if we added more variables that can also impact SDG 8 indicators.
References
PORDATA - Database for European statistics: PORDATA