Titanic Analysis with Machine Learning

🏆 Top 4.7% in Kaggle Competition
RMS Titanic

Project Summary

This project analyzes the Titanic dataset to predict passenger survival using machine learning techniques. An extensive feature engineering process was performed and several classification models were tested.

Process Overview

Data Exploration: Analyzed passenger demographics, class distribution, and survival patterns.
Feature Engineering: Created new features including family size, title extraction, and cabin information.
Data Preprocessing: Handled missing values using iterative imputation and encoded categorical variables.
Model Selection: Tested multiple algorithms including Random Forest, XGBoost, SVM, and Neural Networks.
Hyperparameter Tuning: Optimized model parameters using grid search and cross-validation.
Model Evaluation: Assessed performance using accuracy, confusion matrix, and feature importance analysis.

Model Comparison

Model Comparison

Random Forest showed the best performance with an accuracy of 84.13%, followed by XGBoost and SVM with 82.73%.

Feature Importance

Feature Importance

The passenger's title and gender were the most influential factors in survival prediction.

Survival by Class

Survival by Class

First-class passengers had a significantly higher survival rate (63%) compared to third-class passengers (24%).

Age Distribution

Age Distribution

A slight difference in age distribution is observed between survivors and non-survivors.

Confusion Matrix

Confusion Matrix

The model shows a good balance between true positives and negatives, with relatively few false positives and negatives.

Key Findings

✓ The passenger's title and gender were the most important predictors of survival.

✓ Passenger class had a significant impact on survival chances.

✓ Random Forest outperformed other models with an accuracy of 84.13%.

✓ Family size and fare were also important factors in prediction.

View complete project on GitHub