Artificial Intelligence: Sustainable Development Goal No. 1: ‘Eradicate Poverty’, focusing on the ‘Overburden rate of housing expenditure: total and by type of housing occupancy’

Sustainable Development Goal No. 1: ‘Eradicate Poverty’, focusing on the ‘Overburden rate of housing expenditure: total and by type of housing occupancy’

von Ana Guerreiro - Sonntag, 9. Juni 2024, 18:51

Introduction

I decided to explore the data relating to Sustainable Development Goal 1: ‘Eradicate Poverty’, focusing on the ‘Housing cost burden rate: total and by type of housing occupancy’, in order to establish a possible relationship between the difficulties families face in meeting their housing costs and other relevant factors, such as their income, type of housing and social support for housing.

Data Collection

Dependent variable: ‘Overburden rate of housing expenditure’: Only total housing occupancy regimes.
Independent variable 1: ‘Average equivalent income: by household type (Euro)’: Total indicator type of household only".
Independent variable 2: ‘Population by type of occupancy of usual residence accommodation (%)’: Owner and tenant indicators.
Independent variable 3: ‘Social protection expenditure: total and by function, at constant prices (base=2010)’: Housing indicator only.

Machine Learning Techniques

The following three machine learning techniques were employed to predictions:

Decision Tree
K-Nearest Neighbors (KNN)
Neural Networks

Data Preprocessing

Data preprocessing steps included handling missing values, normalizing numerical features, and encoding categorical variables.

The dataset was split into training and testing sets with a ratio of 70 - 30%.

Machine Learning Models

1. Decision Tree

To create an appendage model using the decision tree method you need discrete data, but the data collected is all continuous. For this reason, I normalised the data obtained by classifying it. The code used to create this learning model. I used the ‘randomForest’ function to create 1 decision tree. The model is trained using the training data set, while the predictions use the test set, based on the model that has been trained.

Results:

Accuracy: High.

2. K-Nearest Neighbors (KNN)

The K nearest neighbour method learning model can be fed with any type of data (continuous or discrete), but it is always necessary to use the classified dependent variable. So I used ‘pure’ data for all the independent variables and the dependent variable used the same classification as was used for the decision tree model.

I used the ‘Knn3’ function, from which I train the model with the training data and then present the predictions based on the test data. I finish with the ‘confusionMatrix’ function to evaluate the method used.

Results:

Accuracy: Low

3. Neural Networks

The learning model of the neural network method requires its output values to be between 0 and 1. For this reason, I updated the classification of the variable data, which had been done in the decision tree, and adapted the new scale.

I used the ‘nnet’ function to train the model with the training data and then present the predictions based on the test data. I finish with the ‘confusionMatrix’ function to evaluate the method used.

Results:

Accuracy: Medium

Discussion

Differences Between Models

Decision Tree: Obtained high accuracy results, although its true positive rate (sensitivity) is only 50% for class 1 and 0% for classes 2 and 3. So this model stands out for its high accuracy, even though its sensitivity varies a lot.
KNN: Performed unsatisfactorily, with an accuracy rate of just 18.18%.
Neural Networks: Had an average performance, since its accuracy was 47.06% and the model converges, which means that a solution is found that is considered satisfactory.

Conclusion

I chose to use the ‘confusionMatrix’ function for all the models so that I could compare the effectiveness and performance of the memsos more easily. I have prepared a performance comparison table:

	Árvore de Decisão	Knn	Rede Neural
Accuracy	0.7273	0.1818	0.4706
Sensitivity	0.8649 (for class 0) 0.5000 (for class 1) 0.0000 (for class 2 and 3)	0.6250 (for class 1) 0.0000 (for class 0, 2 and 3)	0.5000 (for class 0.25)
Specificity	0.4444 (for class 0) 0.8718 (for class 1) 1.0000 (for class 2 and 3)	0.0769 (for class 1) 0.8333 (for class 2) 1. (for class 0 and 3)	0.0000 (for class 0.25)

When I analysed the results of the 3 learning models, I concluded that the decision tree model was the one that managed to best predict the results of the dependent variable, recording an accuracy rate of 72.73%, although its true positive rate (sensibility) was only 50% for class 1 and 0% for classes 2 and 3. So this model stands out for its high accuracy, even though its sensitivity varies greatly.

The neural network had an average performance, since its accuracy was 47.06% and the model converges, which means that it finds a solution that is considered satisfactory.

Finally, the Knn learning model performed poorly, with an accuracy rate of only 18.18%.

I therefore conclude that the independent variables chosen to predict the dependent variable are a good choice when used with the decision tree or neural network learning model.

Knowing that both the decision tree and the neural network generated were based on just one tree/network, I think that expanding to a random forest or a more complex network would help to improve the results found.

I believe that the unsatisfactory performance of the KNN learning mode is related to the heterogeneity of the data being processed, since I am evaluating data related to the economic reality of several European countries, which are, by nature, very different from each other. When the data is divided into the training and test sets, it is distributed randomly and the KNN model, which is characterised by the leaking of neighbours to make its predictions, ends up performing poorly. This method would therefore be more suitable for use on a sub-set of data focussing only on countries with more similar economic realities (such as Portugal and Spain) or with other types of data that have a more ‘stable’ guideline.

References

PORDATA - Database for European statistics: PORDATA

Re: Sustainable Development Goal No. 1: ‘Eradicate Poverty’, focusing on the ‘Overburden rate of housing expenditure: total and by type of housing occupancy’

von Marina Baltar - Montag, 10. Juni 2024, 11:59

Your exploration of Sustainable Development Goal 1: ‘Eradicate Poverty’, focusing on the housing cost burden rate, is commendable for its thorough and structured approach. You demonstrated a comprehensive understanding of the problem by collecting relevant data, preprocessing it effectively, and applying multiple machine learning techniques.

Using Decision Trees, K-Nearest Neighbors (KNN), and Neural Networks to predict the housing cost burden rate showcases your well-rounded analytical skills. Your attention to accuracy, sensitivity, and specificity in evaluating model performance adds depth to your analysis, highlighting the strengths and weaknesses of each method.

Your insight into the Decision Tree model's high accuracy, the Neural Network's satisfactory performance, and the KNN model's challenges due to data heterogeneity shows a keen understanding of the complexities involved. This analysis is a strong example of how data science can address critical social issues, and your recommendations for future improvements reflect a forward-thinking approach. Great work!

Re: Sustainable Development Goal No. 1: ‘Eradicate Poverty’, focusing on the ‘Overburden rate of housing expenditure: total and by type of housing occupancy’

von Mário P Carvalho - Mittwoch, 12. Juni 2024, 01:12

This topic is very relevant to my research because it could potentially serve as a validation for the results I obtained in other variables. If the trends observed here align with what I found previously, it would strengthen the overall conclusions of my analysis.
Thank you, because it allowed me to confirm the way I was elaborating my report.

Re: Sustainable Development Goal No. 1: ‘Eradicate Poverty’, focusing on the ‘Overburden rate of housing expenditure: total and by type of housing occupancy’

von Ana Guerreiro - Mittwoch, 12. Juni 2024, 23:13

Hi Mario,

I'm glad that my research and work was valuable to you.

I was just checking your work and even if your goal was a different one, I can see a clear relation on both. It is interesting to see that some different reserach can help to re-inforce our conclusion.