Skills and Expertise

Data Analysis & Visualization

Proficient in analyzing large datasets using Python, R, and SQL. Expertise in data wrangling, statistical analysis, and creating visualizations with Tableau and Power BI to inform business decisions.

Predictive Modeling & Machine Learning

Skilled in building predictive models using Random Forest, XGBoost, and Deep Learning to predict outcomes, reduce risk, and optimize business strategies.

Big Data & Cloud Computing

Experienced in managing and processing big data using Spark, Hive, and cloud platforms such as AWS and Azure, ensuring scalability and efficiency for large-scale data operations.

Database Management & NoSQL

Expert in relational (SQL) and NoSQL databases, ensuring optimized data storage and retrieval, and proficient in cloud database solutions like AWS RDS and Azure SQL.

Leadership & Collaboration

Proven ability to lead teams and collaborate with cross-functional team in fast-paced environments, mentoring team members, and driving projects to successful completion.

Problem-Solving

Strong analytical skills with a knack for identifying patterns, solving complex problems, and developing innovative solutions that align with business goals.

Current Projects

Predicting Bus Passenger and Traffic Patterns for a Public Transportation Company

Technologies and Techniques: Data Management(SQL), Model Building(Python), Visualization (Tableau), Forecasting Analysis, Regression Analysis

Starting in September, I am working on a project with Prof. Igor and 8 Columbia Students for a confidential public transportation company. I am developing a predictive model to estimate bus passenger counts and analyze how vehicular traffic is impacted by factors such as seasonality, weather patterns, holidays, and major events. The project aims to forecast bus needs and traffic patterns for the next decade by identifying key factors affecting transportation facilities and creating strategies to mitigate potential negative impacts on infrastructure and resource planning.

Attendance Tracking System for Columbia University's APAN Club

Technologies and Techniques: Data Management(PostgreSQL), Data Extraction and Processing (R), Text Mining, ETL

As the president of the APAN Club, I developed an attendance tracking system using PostgreSQL. This system records student no-shows at events and stores survey results, which significantly helped in reducing no-shows. The decrease in no-shows allowed us to budget more efficiently, ensuring that our resources were allocated according to actual event attendance and participant feedback, leading to better-planned and more engaging events.

Work Experience

Forecast Business Analyst Intern

Scarsin - Toronto, Canada

May 2024 - August 2024

  • Increased data processing efficiency by 15% by optimizing backend data querying configurations using Excel Power Query.
  • Enhanced time transformation functions in forecasting applications, improving data usability for drug sales forecasting.
  • Supported UAT planning and execution; collaborated with development teams to resolve discrepancies for system implementation.

Data Analyst Intern

eBay - Shenzhen, China

July 2023 - December 2023

    At eBay, I worked as a data analyst for the Top Seller Account Management team. In which, I worked extensively with analyzing the transaction and user data using SQL, Excel and Python. By deep diving into those data allowed me to uncover valuable insights for my manager in variouse area related to consumer behavior, leaf category performance, and flactuation in market KPI.

    To enable internal stakeholders visuallying track trends and KPIs in real-time, I developed 12 interactive Tableau dashboards, offering insights into areas such as buyer experience, site traffic, logistics and warehouse operations, and campaign trends, ultimately supporting more informed decision-making. Those dashboards were actively adopted by 14 internal stakeholders.

    By collaborating with cross-functional teams such as the advertising teams, we collected and cleaned campaign data together, and used Excel Pivot Table to demostrate to senior leadership poential area of improvement.

Student Assistant, Alumni Relations and Development Office

Columbia University - New York, NY

April 2023 - July 2023

  • Managed Columbia University's CRM database, maintaining 3k+ alumni records and tracking event attendance.
  • Curated an engaged list of alumni donors by analyzing event participation and past gift-giving data.
  • Assisted in organizing records for donor contributions and event attendance.

Recent Projects

Loan Default Prediction

Utilized supervised machine learning models, including Logistic Regression, Random Forest and GBM, to predict mortgage loan defaults. Conducted thorough exploratory data analysis (EDA) and feature engineering, improving model performance by enhancing key metrics such as AUC by 20% through hyperparameter tuning and sampling techniques.

Fan App for Ed Sheeran's World Tour

Developed an interactive web application using R Shiny to simulate Ed Sheeran’s World Tour data. Designed and implemented a user-friendly interface with real-time visualizations and insights on tour statistics, venue information, and fan engagement metrics, leveraging ETL processes and data normalization for efficient data management.

Healthcare Provider Fruad Detection

Aenean ornare velit lacus, ac varius enim lorem ullamcorper dolore. Proin aliquam facilisis ante interdum. Sed nulla amet lorem feugiat tempus aliquam.

NYC Public Budget Expense Analysis

Developed a Flask web app integrated with both MongoDB and PostgreSQL to analyze NYC government agencies' budget expenses. The app enables users to explore over 2M+ records, visualizing trends and insights in public budget allocations. This project supports rational cost-saving strategies and optimizes resource allocation for city planning and development.

NYC Motor Vehicle Accidents Rate Analysis

Analyzed NYC motor vehicle accident cases in 2022. We used Tableau to map collisions and cluster data based on accident trends. The project examined patterns by day, time, and accident type, providing actionable insights for resource deployment and strategic decision-making for reducing accidents.

Sportify Music Rating Prediction

Used R to predict user ratings of music tracks on Sportify. Applied machine learning models and statistical analysis to identify factors influencing user preferences and music ratings, enhancing the accuracy of prediction models for personalized music recommendations.

Education Background

Columbia University Logo

M.S in Applied Analytics

(2023-2024)

Cum. GPA: 4.0/4.0

Courses: • Anomaly Detection (Python, R) • SQL and Relation Database • Cloud Computing (AWS, Azure) • Managing Data (MongoDB, Neo4j) • Time Series Analysis and Forecasting • Research Design • Strategy and Analytics

NYU Logo

B.A in Mathematics and Economics

(2018-2022)

Cum. GPA: 3.7/4.0

Courses: • Optimization • Linear Algebra • Econometrics • Probability Theory • Statistics • Linear Algebra • Econometrics • Microeconomics Analysis • Macroeconomics Analysis