Artificial Intelligence: Predicting the Water Exploitation Index (WEI+) Using Urban Wastewater Treatment Coverage, Precipitation, and Surface Area

Introduction

The United Nations Sustainable Development Goals (SDGs) provide a blueprint for achieving a better and more sustainable future. Each SDG has specific targets and indicators to measure progress. This report focuses on predicting one indicator from SDG 6 (Clean Water and Sanitation) using three machine learning (ML) techniques. Specifically, the indicator selected is the "Water Exploitation Index (WEI+)," which measures the proportion of total water resources used.

Data Collection

Data was collected from relevant sources, encompassing various macro-level indicators that could potentially influence the WEI+. These indicators include:

Population served by urban wastewater treatment systems (%)
Precipitation
Surface area

Machine Learning Techniques

The following three machine learning techniques were employed to predict the WEI+:

Decision Tree
K-Nearest Neighbors (KNN)
Neural Networks

Data Preprocessing

Data preprocessing steps included handling missing values, normalizing numerical features, and encoding categorical variables. For simplicity, the dataset was split into training and testing sets in a 70:30 ratio.

Exploratory Data Analysis (EDA)

EDA revealed the following insights:

Higher population served by wastewater treatment systems generally correlated with lower WEI+.
Higher precipitation was associated with lower WEI+.
Larger surface areas had varied impacts on WEI+, indicating the need for further analysis.

Machine Learning Models

1. Decision Tree

Decision Trees were chosen for their simplicity and interpretability. The model was implemented using the rpart package in R.

Results:

Accuracy: Evaluated using a confusion matrix.
Important Factors (Feature Importance):
- Population served by wastewater treatment systems
- Precipitation
- Surface area

2. K-Nearest Neighbors (KNN)

KNN was selected for its simplicity and effectiveness in classification tasks.

Results:

Accuracy: Evaluated using a confusion matrix.
Important Factors: Determined by the nearest neighbors algorithm based on the normalized values.

3. Neural Networks

Neural Networks were used for their ability to model complex relationships in the data.

Results:

Accuracy: Evaluated using Mean Squared Error (MSE) and other relevant metrics.
Important Factors: Based on the weights learned by the network during training.

Discussion

Differences Between Models

Decision Tree: Highly interpretable but may not capture complex relationships.
KNN: Simple and effective for small datasets but sensitive to the choice of k and scaling of features.
Neural Networks: Capable of modeling complex relationships but require careful tuning and are less interpretable.

Key Factors Affecting Water Exploitation Index

Population served by wastewater treatment systems: Better infrastructure reduces water exploitation.
Precipitation: Natural water availability reduces the need for additional water exploitation.
Surface area: Larger areas may have more capacity for water storage and management.

Conclusion

The Neural Networks model emerged as the most effective technique for predicting the Water Exploitation Index, with precipitation and population served by wastewater treatment systems being the most influential factors. These findings can guide policymakers in prioritizing investments and interventions to improve water management outcomes.

Recommendations

Improve wastewater treatment infrastructure to increase the population served.
Enhance water conservation measures in areas with low precipitation.
Develop policies for sustainable water management considering regional surface area capacities.

References

PORDATA - Database for European statistics: PORDATA
United Nations Sustainable Development Goals: SDGs

Re: Predicting the Water Exploitation Index (WEI+) Using Urban Wastewater Treatment Coverage, Precipitation, and Surface Area

by Mariana Barrote - Thursday, 30 May 2024, 6:53 PM

Hello Marina, I think your report is very good.
Your report highlights the importance of data preprocessing, which included handling missing values, normalizing numerical features, and encoding categorical variables. Exploratory Data Analysis (EDA) revealed insights into the relationships between the indicators and the WEI+, such as higher population served by wastewater treatment systems correlating with lower WEI+ and higher precipitation being associated with lower WEI+.
About your choice of machine learning methods:
The Decision Tree model provided interpretability but may not have captured complex relationships;
KNN was simple and effective for small datasets but sensitive to the choice of k and scaling of features;
Neural Networks emerged as the most effective technique, capable of modeling complex relationships, although they require careful tuning and are less interpretable.
Although you did a great job, using them.
Your report provides a valuable framework for predicting the Water Exploitation Index using machine learning techniques and highlights the significance of infrastructure, precipitation, and surface area in sustainable water management. By addressing these factors, countries can make progress towards achieving SDG 6 and ensuring access to clean water for all.

Re: Predicting the Water Exploitation Index (WEI+) Using Urban Wastewater Treatment Coverage, Precipitation, and Surface Area

by Paulo Jorge Couto Tavares - Thursday, 30 May 2024, 7:56 PM

Marina,

Your report on predicting the Water Exploitation Index (WEI+) using machine learning techniques is comprehensive and insightful. The choice of indicators population served by urban wastewater treatment systems, precipitation, and surface area is well-justified given their potential influence on water resources.

Your methodology is solid, particularly in how you handled data preprocessing and exploratory data analysis. The step of normalizing numerical features and encoding categorical variables is essential for ensuring model accuracy and comparability.

In terms of machine learning techniques, the selection of Decision Tree, K-Nearest Neighbors (KNN), and Neural Networks provides a good mix of interpretability, simplicity, and complexity. Your use of Decision Trees for their simplicity and interpretability is commendable, though it is worth noting that they may not always capture complex relationships. The KNN model's reliance on proximity is a strength in classification tasks but, as you mentioned, its sensitivity to the choice of k and feature scaling can affect performance.

The Neural Networks' ability to model complex relationships is well-highlighted, and your use of Mean Squared Error (MSE) to evaluate accuracy is appropriate. However, the need for careful tuning and the model's lower interpretability compared to Decision Trees should be considered.

Your findings that population served by wastewater treatment systems and precipitation are the most influential factors make sense, and it's insightful to see that larger surface areas had varied impacts, indicating the necessity for further analysis.

The discussion section provides a clear comparison of the models, and your conclusions and recommendations are actionable and relevant. Improving wastewater treatment infrastructure and enhancing water conservation measures are critical steps, and developing policies based on regional surface area capacities is a strategic approach.

Overall, your report effectively demonstrates how machine learning can be used to predict important environmental indicators and provides valuable guidance for policymakers.

Excellent work!

Best regards,
C. Tavares

Re: Predicting the Water Exploitation Index (WEI+) Using Urban Wastewater Treatment Coverage, Precipitation, and Surface Area

by Alix Paulino - Friday, 7 June 2024, 1:05 AM

Hello Marina.

Congratulations on the excellent work!

The report is well-organized, highlighting the strengths and limitations of each machine learning technique, and offers clear, actionable recommendations for policymakers.

Wonderful job!

Best regards,
Alix

Re: Predicting the Water Exploitation Index (WEI+) Using Urban Wastewater Treatment Coverage, Precipitation, and Surface Area

by Ana Guerreiro - Sunday, 9 June 2024, 6:55 PM

Hi Marina,

NIce work, well organized and very usefull to undersatnt the 3 types of machine learng models we studyed.

Best regards,
Ana Guerreiro