- **SortingHat Downstream Benchmark Replication(Data Science Senior Capstone)
- Wrote code to replicate downstream benchmark experiments for feature type inference models produced by Project Sortinghat
- Deployed experiment code and environment to UCSD computing clusters using docker containers
- **Notez.ai(Honorable Mention at SDHacks 2021)
- Built a website that allows for students to generate structured study guides from transcripts of Zoom meetings and lectures
- Utilizes topic modeling, text summarization modeling, and StreamLit for the frontend website
- Group members: George Pu and Rudy Thurston
- Time Series Modeling of AirBnB Prices in San Diego:
- Forecasted the average prices of AirBnB’s in San Diego using SARIMAX and FB Prophet time series models
- Created an interface for users to input search criteria for an AirBnB and receive price predictions from the time series model
- Textual Analysis of Chronic Illness Surveys(1st Place Entry for UCSD DataHacks 2020):
- Applied T-Stochastic Neighbor Embeddings, GloVe, and TF-IDF to visualize survey questions in a 2d space using Tableau dashboards
- Analyzed the effectiveness of Tobacco laws in the US for restricting underage tobacco sales by performing permutation testing
- Group member: Ayush More
- Microsoft Pet Breed Classification Challenge(2019 UCSD Datathon Submission):
- Trained Convolutional Neural Network models to classify 40 distinct breeds of pets from images
- Achieved 90% test accuracy and 4th place on the Kaggle Leaderboard
- Group members: Wayde Gilliam and George Pu
- UCSD Residential College Clustering:
- Implemented the unsupervised K-means clustering algorithm to identify UCSD residential colleges
- Wrangled a dataset of landmarks and locations on UCSD’s campus using the Google Maps API