Predict Survival on the Titanic
Quick view
Goal: it is trying to use passenger list (e.g. Name, Sex, Class, Age…etc) to classify one would servive or not.
Metrics: Accuracy
Results:
- Baseline:
- NaiveBayes (val_acc = 0.78)
- Logistic (val_acc = 0.79)
- Feature Engineering:
- Have age from continuous number into categorical bins
- Clean text feature “Title”
- Create new feature “family size” by sum up parents and childern
- One-hot-label to categorical features, and Standarscaler() for numericals.
- ML Models:
- Logistic regression (val_acc: 0.829)
- KNN (val_acc: 0.829)
- Decision Tree (val_acc: 0.836)
- Random Forest (val_acc:0.831)
- SVM (val_acc:0.829)
- XGBoost (Val_acc:0.841)
All models has been fine-tuned by RandomSearchCV