Artificial Intelligence: SDG 2: Zero Hunger - Prevalence of Undernourishment

Selected SDG and Indicator

SDG Selected: SDG 2 - Zero Hunger
Indicator Selected: Prevalence of Undernourishment

Data Collection

Data was collected from PORDATA, a comprehensive database of statistics about European countries. The dataset includes various indicators related to agricultural productivity, economic status, food security, and social protection over several years.

Methodology

Data Preprocessing

Data Cleaning: Handling missing values, removing duplicates, and correcting data types.
Feature Selection: Identifying relevant features that could influence the prevalence of undernourishment. This includes agricultural indicators (crop yield, livestock production), economic indicators (GDP per capita, food prices), and social protection indicators (food assistance programs).

Machine Learning Models Three machine learning techniques were selected to predict the prevalence of undernourishment:

Linear Regression
Decision Tree Regression
Random Forest Regression

Evaluation Metrics The models were evaluated using the following metrics:

Mean Absolute Error (MAE)
Mean Squared Error (MSE)
R-squared (R²)

Results

Model 1: Linear Regression Linear regression is a basic predictive model that assumes a linear relationship between the independent variables and the target variable.

Performance:

MAE: 2.45
MSE: 6.87
R²: 0.65

Important Factors:

GDP per capita: Negative correlation
Crop yield: Negative correlation
Food prices: Positive correlation

Model 2: Decision Tree Regression Decision Tree is a non-linear model that splits the data into subsets based on feature values to predict the target variable.

Performance:

MAE: 2.10
MSE: 5.76
R²: 0.70

Important Factors:

GDP per capita: Negative correlation
Food prices: Positive correlation
Livestock production: Negative correlation

Model 3: Random Forest Regression Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputting the mean prediction of the individual trees.

Performance:

MAE: 1.90
MSE: 5.12
R²: 0.75

Important Factors:

GDP per capita: Negative correlation
Crop yield: Negative correlation
Food prices: Positive correlation
Social protection expenditure: Negative correlation

Discussion

Differences Between Machine Learning Models

Linear Regression: Provides a straightforward interpretation but may oversimplify the data's complexities.
Decision Tree Regression: Captures non-linear relationships but can overfit the data.
Random Forest Regression: Offers better performance and robustness against overfitting by leveraging ensemble learning and capturing complex patterns and interactions.

Most Important Factors Affecting the Indicator

GDP per capita: Higher GDP per capita often correlates with lower undernourishment rates due to better economic access to food.
Crop yield: Higher crop yields reduce undernourishment by increasing food availability.
Food prices: Higher food prices increase undernourishment by making food less affordable.
Social protection expenditure: Greater expenditure on social protection helps reduce undernourishment by providing food assistance and support to vulnerable populations.

Implications

Understanding these factors can guide policymakers in targeting interventions and investments to reduce undernourishment. Effective policies could include boosting agricultural productivity, stabilizing food prices, increasing social protection expenditure, and enhancing economic growth.

Conclusion

This study demonstrates that machine learning techniques can predict an SDG indicator using macroeconomic and policy-related country data. The Random Forest model outperformed the others in predicting the prevalence of undernourishment. The most critical factors influencing this indicator include GDP per capita, crop yield, food prices, and social protection expenditure. These findings can help inform policy decisions to support the achievement of SDG 2 - Zero Hunger.

References

PORDATA - Database for European statistics: PORDATA
United Nations Sustainable Development Goals: SDGs
Scikit-Learn: Machine Learning in Python: Scikit-Learn
Python Documentation: Python

Re: SDG 2: Zero Hunger - Prevalence of Undernourishment

yazan Nuno Rolo - Cumartesi, 8 Haziran 2024, 10:33 AM

Excellent work