Research
Survey on State Data Use
This project is sponsored by the U.S. Census Bureau. The goal of this project is to assit U.S. Census Burean in the creation of their Curated Data Enterprise, a tool that will combine data from multiple Census sources, to create a cohesive data lake that can be used as a tool by State governments.
To accomplish this, I have been focusing on identifying how state governments use data and identifying what those data sources are. I utilized BeautifulSoup to web scrap data from state data centers and the Federal-State Cooperative for Population Estimates contacts through the data collection process.
To gain insight into how entities harness the power of data and report findings on how states leverage their data, I've applied a range of technical skills. Employing methodologies such as BERT topic modeling, word cloud analysis, and data visualizations, I've sought to unveil insights within complex datasets.
Learn more from our project website: Survey on State Data Use
Economy and Non-profit Healthcare Donation
Sponsored by Biokind Analytics, this project aims to address two questions:
1. Who are the potential doners to healthcare organizations?
2. How does the economy affect doners' altruistic behaviors?
Our data sources include the Panel Survey of Income Dynamics (PSID), the National Center for Charitable Statistics (NCCS) core data, the Bureau of Labor Statistics (BLS) expenditure survey and employment data, and the historical data of several openly traded financial indices. Utilizing my skills, I contributed in analyzing PSID data by doing data cleaning and visualization. I also shoulder the responsibility of writing MethodSpace article as well as building website to report our findings.
Learn more from our project website: Economy and Non-profit Healthcare Donation
Movie Genres and Economy Cycles
The primary goal of this project is to investigate potential correlations between widely embraced movie genres and socio-economic factors across various years. To achieve this objective, I employ web scraping techniques to gather annual data on the top 10 most popular movies from 2000 onwards. Subsequently, I conduct an in-depth text analysis on the movie scripts, utilizing advanced methods such as topic modeling to identify and extract prevailing themes for each year. The next step involves applying linear regression to scrutinize the connection between the identified movie themes and indicators of economic recession.
By following this structured approach, our project seeks to uncover any meaningful connections between the choice of movie genres and the prevailing economic conditions over time. This research promises valuable insights into the intricate interplay between the entertainment industry and broader socio-economic trends.
Learn more from our presentation slides: slides