Wage Distribution Analysis
Completion Date: In Progress
Project Overview
This project explores the income distribution of households across different income groups using census data. This project was aimed at being a demonstration of an exploritory data analysis. Starting with data and a question, ending with deeper understanding into the data and what it shows. It's written like a story from the first person perspective, in order to show my methodology and thought process into the analysis. While this writeup can give you the basics, I'd highly recommend reading the whole notebook available on github.
Data Preparation
The data was sourced from the U.S. Census Bureau and contains household income estimates categorized by income brackets for different years. Key steps in preparing the data include:
- Cleaning:
- Removed symbols such as %, ±, and commas to ensure numerical consistency.
- Filtered out rows with missing or invalid data in the households_estimate column.
- Data Filtering:
- Focused the analysis on the year 2010 for simplicity and clarity.
- Removed rows with non-positive values in households_estimate.
- Ensuring Validity:
- Verified that each income group contained valid numeric data to avoid empty groups in the visualization.
Methodology
This project is currently in progress, and just plots the distributions over time. Future enhancements will be added.
Tools and Libraries
- Python
- Pandas
- MatPlotLib
- Seaborn
Future Work
This analysis can be extended by:
- Comparing and measuring income distributions across multiple years to identify trends.
- Measuing spread of distributions within, and across years.
- Expanding the visualization to include additional household categories (e.g., families, nonfamily households).
Acknowledgments
Dataset is from US Census Bureau: https://data.census.gov/table/ACSST5Y2020.S1901?q=S1901
Author
This project was developed by John Curran.
Tools/Languages Used
- Python
- Pandas
- Seaborn
- MatPlotLib