Duc Phat Nguyen

A student looking for opportunities in Data Science & AI

Project Income Disparity Thumbnail

Income Disparity Analysis

Exploring Income Disparity and Model Performance in the UCI Adult Dataset: An Analysis of Demographics and Tree-Based Models.

View on GitHub

Built With

Python Badge NumPy Badge pandas Badge scikit-learn Badge matplotlib Badge

The UCI Adult Dataset, containing various individual attributes, was analyzed to predict whether an individual's income exceeds $50,000 per year. The project aimed to discover the demographic of the population and the above-average income level group, as well as examine the performance of Decision Tree and Random Forest models in predicting income levels.

The project involved two main tasks:

  1. Exploring the dataset using descriptive statistics and visualizations to uncover demographic characteristics and income distribution patterns.
  2. Employing Decision Tree and Random Forest models to predict income levels and examining their performance with and without hyperparameter tuning.

Visualisation

Distribution of Age
Distribution of Age based on Education and Gender for High Income Individuals
Training Score

Data Exploration

The dataset was explored using descriptive statistics and visualizations, revealing:

Data Modeling

Decision Tree and Random Forest models were employed to predict income levels. Key findings include:

Visit the GitHub Repository

For the full code and documentation, visit the GitHub repository.