Introduction
The United Nations Sustainable Development Goals (SDGs) provide a blueprint for achieving a better and more sustainable future. Each SDG has specific targets and indicators to measure progress. This report focuses on predicting one indicator from SDG 6 (Clean Water and Sanitation) using three machine learning (ML) techniques. Specifically, the indicator selected is the "Water Exploitation Index (WEI+)," which measures the proportion of total water resources used.
Data Collection
Data was collected from relevant sources, encompassing various macro-level indicators that could potentially influence the WEI+. These indicators include:
- Population served by urban wastewater treatment systems (%)
- Precipitation
- Surface area
Machine Learning Techniques
The following three machine learning techniques were employed to predict the WEI+:
- Decision Tree
- K-Nearest Neighbors (KNN)
- Neural Networks
Data Preprocessing
Data preprocessing steps included handling missing values, normalizing numerical features, and encoding categorical variables. For simplicity, the dataset was split into training and testing sets in a 70:30 ratio.
Exploratory Data Analysis (EDA)
EDA revealed the following insights:
- Higher population served by wastewater treatment systems generally correlated with lower WEI+.
- Higher precipitation was associated with lower WEI+.
- Larger surface areas had varied impacts on WEI+, indicating the need for further analysis.
Machine Learning Models
1. Decision Tree
Decision Trees were chosen for their simplicity and interpretability. The model was implemented using the rpart package in R.
Results:
- Accuracy: Evaluated using a confusion matrix.
- Important Factors (Feature Importance):
- Population served by wastewater treatment systems
- Precipitation
- Surface area
2. K-Nearest Neighbors (KNN)
KNN was selected for its simplicity and effectiveness in classification tasks.
Results:
- Accuracy: Evaluated using a confusion matrix.
- Important Factors: Determined by the nearest neighbors algorithm based on the normalized values.
3. Neural Networks
Neural Networks were used for their ability to model complex relationships in the data.
Results:
- Accuracy: Evaluated using Mean Squared Error (MSE) and other relevant metrics.
- Important Factors: Based on the weights learned by the network during training.
Discussion
Differences Between Models
- Decision Tree: Highly interpretable but may not capture complex relationships.
- KNN: Simple and effective for small datasets but sensitive to the choice of k and scaling of features.
- Neural Networks: Capable of modeling complex relationships but require careful tuning and are less interpretable.
Key Factors Affecting Water Exploitation Index
- Population served by wastewater treatment systems: Better infrastructure reduces water exploitation.
- Precipitation: Natural water availability reduces the need for additional water exploitation.
- Surface area: Larger areas may have more capacity for water storage and management.
Conclusion
The Neural Networks model emerged as the most effective technique for predicting the Water Exploitation Index, with precipitation and population served by wastewater treatment systems being the most influential factors. These findings can guide policymakers in prioritizing investments and interventions to improve water management outcomes.
Recommendations
- Improve wastewater treatment infrastructure to increase the population served.
- Enhance water conservation measures in areas with low precipitation.
- Develop policies for sustainable water management considering regional surface area capacities.
References
- PORDATA - Database for European statistics: PORDATA
- United Nations Sustainable Development Goals: SDGs