Portfolio

You can find these projects on my Github: Portfolio


Natural Language Processing for Hotel Reviews

In this project, I developed a machine learning solution to analyze hotel reviews and predict customer satisfaction scores using a combination of text and numeric data. I focused on data preprocessing by cleaning and vectorizing review text, capping features at 500 tokens, and scaling numeric data to create a robust dataset. I integrated text and numeric data to train and test machine learning models, including Logistic Regression and Decision Tree algorithms, optimizing the latter using PCA and hyperparameter tuning with cross-validation. By evaluating model performance using precision, recall, and confusion matrices, I identified the top 20 predictive words from reviews, which helped uncover key factors influencing customer satisfaction. The project provided actionable insights into drivers of customer satisfaction, empowering hotel management to focus on improvements that lead to higher ratings and enhanced guest experiences.


Modeling Mosquito Dynamics and West Nile Virus Presence: A Regression AnalysisModeling Mosquito Dynamics and West Nile Virus Presence: A Regression Analysis

Built linear and logistic regression models to analyze factors influencing mosquito counts and West Nile Virus (WNV) presence, achieving an accuracy of 82.86% despite limitations in the data. The dataset showed some variability and heteroscedasticity, but the models effectively captured key trends, such as species-specific impacts, geographic patterns, and a positive correlation between mosquito numbers and West Nile Virus likelihood, providing valuable insights.

Analyzing U.S. Air Traffic Trends and Delays Using SQLAnalyzing U.S. Air Traffic Trends and Delays Using SQL

This project was designed to showcase my SQL skills by analyzing U.S. air traffic data from 2018 and 2019. Using SQL queries, I explored trends in flight patterns, cancellations, and delays, uncovering insights such as the impact of weather on cancellations, airline performance over time, and time-of-day effects on delays. Highlights include calculating year-over-year changes in flights and miles traveled by airline, identifying the most popular airports, and ranking airports with the highest morning delays. Through efficient query optimization, I navigated large datasets and extracted actionable insights, demonstrating my ability to handle complex data and perform in-depth analysis using SQL.


Data Analysis on Crowdfunding InsightsData Analysis on Crowdfunding Insights

As part of my application to the BrainStation Data Science program, I completed a comprehensive project focused on launching a successful Kickstarter campaign based on historical campaign data. The project involved strategic planning, market analysis, and leveraging data to maximize campaign funding. Key activities included:

Data Cleaning: Ensured data accuracy and consistency using SQL queries to remove records with missing values and check for data correctness.
Funding Goal Analysis: Analyzed Kickstarter data to determine optimal funding goals. Found that realistic targets significantly increase campaign success, with successful campaigns averaging goals around $9,743.
Backer Analysis: Examined the number of backers and average pledges. Successful campaigns attracted more backers (average 785) and higher pledges per backer ($90).

By utilizing SQL, data analysis, and data visualization, I provided actionable insights to position the Kickstarter campaign to optimize funding and chances of success.