Predicting SDG Indicator Using Machine Learning Techniques

Predicting SDG Indicator Using Machine Learning Techniques

Nosūtīja Alix Paulino
Atbilžu skaits: 2

Report on Predicting SDG Indicator Using Machine Learning Techniques

Introduction

The Sustainable Development Goals (SDGs) provide a comprehensive framework for addressing global challenges. Linking policy indicators with SDGs can help determine the requirements to achieve these goals. This report explores the relationship between a specific SDG indicator and relevant policy indicators using three machine learning techniques: Decision Trees, K-Nearest Neighbors (KNN), and Neural Networks. The focus is on predicting the percentage of the population reporting occurrences of crime, violence, and vandalism in their area.

Selected Indicator and Independent Variables

Indicator:

  • Percentage of the population reporting occurrences of crime, violence, and vandalism in their area.

Independent Variables:

  • Housing overcrowding rate: Percentage of people living in houses without sufficient rooms for all family members.
  • Average number of rooms per person: Total and by household type.
  • Participation of adults in learning in the last four weeks: Percentage of people aged 25-64 receiving formal or informal education/training.

Data Collection and Preparation

Data was collected from PORDATA, spanning from 2005 to 2020. Each dataset was transformed into a CSV file. The data was initially structured with years as rows and countries as columns, necessitating transformation into a long format for analysis.

Steps taken for data preparation:

  1. Data Import and Transformation:
    • Read CSV files using read.csv.
    • Transformed data into a long format with years and countries as columns.
  2. Data Merging:
    • Created a concatenated column combining year and country as an identifier.
    • Merged datasets using the concatenated column.
  3. Data Cleaning:
    • Removed rows with invalid values.
    • Ensured correct data types.

Methodology

Three machine learning techniques were applied to predict the selected indicator.

1. Decision Trees:

  • Binarized the indicator using the median value to create a balanced class distribution.
  • Achieved a model accuracy of 75.79%.

2. K-Nearest Neighbors (KNN):

  • Normalized data to a [0, 1] range.
  • Achieved a model accuracy of 75.79%.

3. Neural Networks:

  • Normalized data to a [-1, 1] range.
  • Achieved a model accuracy of 54.74%.

Results and Discussion

The results highlight the varying performance of the three machine learning models.

  • Decision Trees and KNN:

    • Both models achieved the same accuracy (75.79%), suggesting a strong relationship between the independent variables and the indicator.
    • Decision Trees provided insights into the importance of each variable, indicating which factors most significantly influence the perception of crime.
  • Neural Networks:

    • Achieved a lower accuracy (54.74%), indicating potential issues with data normalization or model parameters.
    • Further optimization and tuning might be necessary to improve performance.

Factors Influencing the Indicator

The analysis of variable importance from the Decision Trees model revealed key factors:

  • Housing Overcrowding Rate:
    • Higher overcrowding rates are associated with increased reporting of crime, violence, and vandalism.
  • Participation in Adult Learning:
    • Higher participation in learning correlates with lower crime reporting, suggesting a potential link between education and crime perception.
  • Number of Rooms per Person:
    • More rooms per person are associated with lower crime reporting.

Conclusion

Machine learning techniques can effectively predict SDG indicators using relevant country data. The study shows that Decision Trees and KNN perform well in this context, while Neural Networks require further refinement. Understanding the factors influencing SDG indicators can guide policy decisions to achieve sustainable development.

Future Work

Further research could explore:

  • Incorporating additional variables for a more comprehensive analysis.
  • Applying advanced machine learning techniques and fine-tuning model parameters.
  • Extending the study to other SDG indicators for a broader understanding of policy impacts.

References:

Atbildot uz Alix Paulino

Re: Predicting SDG Indicator Using Machine Learning Techniques

Nosūtīja Fernanda Mello
Hello, Alix Paulino!

Your report on predicting SDG indicators using machine learning techniques is thorough and well-structured. Here are some insights and feedback on your work:

Comprehensive Overview: Your report provides a comprehensive overview of the research conducted, including the selection of the SDG indicator, independent variables, data collection and preparation steps, methodology, results, and discussion. This comprehensive approach enhances the clarity and understanding of the study's objectives and findings.

Clear Methodology: The methodology section outlines the steps taken for data preparation and the application of machine learning techniques in a clear and concise manner. This clarity helps readers understand the workflow and replicability of the study.

Insightful Results and Discussion: The results and discussion section effectively summarizes the performance of the machine learning models and provides insights into the factors influencing the selected SDG indicator. The comparison of model accuracies and the analysis of variable importance offer valuable insights for policymakers and researchers.

Future Work Suggestions: The future work section provides useful suggestions for extending the research, including incorporating additional variables, exploring advanced machine learning techniques, and extending the study to other SDG indicators. These suggestions enhance the potential impact and relevance of the research in addressing broader sustainability challenges.

Overall, your report demonstrates a rigorous and systematic approach to predicting SDG indicators using machine learning techniques. The clarity, thoroughness, and insightful analysis presented in your work contribute to advancing our understanding of the complex relationships between policy indicators and sustainable development goals.

Best regards,
Fernanda Mello
Atbildot uz Fernanda Mello

Re: Predicting SDG Indicator Using Machine Learning Techniques

Nosūtīja Alix Paulino
Hello Fernanda,

Thank you very much for your valuable feedback.

I'm glad you found the report clear and comprehensive.

Your suggestions for future work are greatly appreciated and will be taken into consideration.

Best regards,
Alix