• **SortingHat Downstream Benchmark Replication(Data Science Senior Capstone)
    • Wrote code to replicate downstream benchmark experiments for feature type inference models produced by Project Sortinghat
    • Deployed experiment code and environment to UCSD computing clusters using docker containers
  • **Notez.ai(Honorable Mention at SDHacks 2021)
    • Built a website that allows for students to generate structured study guides from transcripts of Zoom meetings and lectures
    • Utilizes topic modeling, text summarization modeling, and StreamLit for the frontend website
    • Group members: George Pu and Rudy Thurston
  • Time Series Modeling of AirBnB Prices in San Diego:
    • Forecasted the average prices of AirBnB’s in San Diego using SARIMAX and FB Prophet time series models
    • Created an interface for users to input search criteria for an AirBnB and receive price predictions from the time series model
  • Textual Analysis of Chronic Illness Surveys(1st Place Entry for UCSD DataHacks 2020):
    • Applied T-Stochastic Neighbor Embeddings, GloVe, and TF-IDF to visualize survey questions in a 2d space using Tableau dashboards
    • Analyzed the effectiveness of Tobacco laws in the US for restricting underage tobacco sales by performing permutation testing
    • Group member: Ayush More
  • Microsoft Pet Breed Classification Challenge(2019 UCSD Datathon Submission):
    • Trained Convolutional Neural Network models to classify 40 distinct breeds of pets from images
    • Achieved 90% test accuracy and 4th place on the Kaggle Leaderboard
    • Group members: Wayde Gilliam and George Pu
  • UCSD Residential College Clustering:
    • Implemented the unsupervised K-means clustering algorithm to identify UCSD residential colleges
    • Wrangled a dataset of landmarks and locations on UCSD’s campus using the Google Maps API