Predict Survival on the Titanic

Quick view

Goal: it is trying to use passenger list (e.g. Name, Sex, Class, Age…etc) to classify one would servive or not.

Metrics: Accuracy

Results:

  • Baseline:
    • NaiveBayes (val_acc = 0.78)
    • Logistic (val_acc = 0.79)
  • Feature Engineering:
    • Have age from continuous number into categorical bins
    • Clean text feature “Title”
    • Create new feature “family size” by sum up parents and childern
    • One-hot-label to categorical features, and Standarscaler() for numericals.
  • ML Models:
    • Logistic regression (val_acc: 0.829)
    • KNN (val_acc: 0.829)
    • Decision Tree (val_acc: 0.836)
    • Random Forest (val_acc:0.831)
    • SVM (val_acc:0.829)
    • XGBoost (Val_acc:0.841)

    All models has been fine-tuned by RandomSearchCV

XGBoost training and validation plot

Github