Projects

Predicting Airbnb Prices in European Cities with Machine Learning

April 2023

In this project, I worked in a team of four where we predicted the Airbnb prices of London, Paris, and Athens. To do this, we used multiple machine learning methods such as subset selection, decision trees, and ensemble methods like bagging and random forest. This was done mostly in R using tree, randomForest, and glmnet libraries. My responsibility was cleaning the data, applying the tree based methods random forest and bagging and implementing the validation set and cross-validation set approaches to all of the methods. 

Our initial Linear Model was designed as follows:



Our goal was to find the method which resulted in the smallest test error rate. To do this, we applied cross-validation in our methods and used a validation set to test our data. In conclusion, we found random forest to be the best method with the lowest mean squared error.

Effect of Minimum Wage on Employment: A Statewide Case Analysis

December 2022

This paper was written in a group of four where we analyzed the effect of an increase in minimum wage on state unemployment. The two states we observed were New York as the control and Massachusetts as the treatment and the time period we observed was 1997 to 2008. These two states were chosen as they are geographically quite similar and their unemployment rates changed at equal rates before our observed time period. These two aspects were ensured to reduce bias in the analysis. During the observed time period as well, Massachusetts had their effective minimum wage increased by a total of $1.50 from 2000 to 2002, compared to New York which had no change in minimum wage. 

We decided to run a linear regression on the difference of minimum wage and unemployment from 1999 as the initial year and each year from 2001 to 2004 as the post year to see if there were any observed changes in unemployment rate. We decided to use counties as individual datapoints as they would provide the most information with the least amount of bias. Additionally, we included five covariates of average income, population, total labor force, labor force participation (labor force / population), and population density (population / land area). These were included in as we were concerned they may have affected the unemployment rate over time. Our final regression equation looked like the following:


The data we used came from multiple sources such as the US Bureau of Labor Statistics, National Bureau of Economic Research, Federal Reserve Economic Data, and USA Location Information. The data we initially obtained required lots of cleaning which was my responsibility. This was mostly done in R or SQL. 

Before we could make a conclusion we needed to run a check for robustness which was done by running a regression on a dependent variable other than the unemployment rate which should not be changed by time period and state changes. This robustness check was ran with population as the dependent variable. The regression equation looked like the following:


This robustness check was not significant so we concluded that our original model was in fact robust. From here we were able to run the regressions on our initial model with the five confounders. This was mostly my responsibility and it was written entirely in R using the sandwich and lmtest packages. We concluded by observing the t-statistic that an increase in minimum wage does not have an effect on unemployment. This was in line with economic theory and previous literature that had been written about the subject in the past.