Mechanisms of Action (MoA) Prediction

Quick View

Goal:

  • Predict MoA (only 0 and 1) from data of gene and cell vitality, build multiple binary classifier

Metrics:

  • Minimum Log loss

Procedure:

  • MoA tag are extremely imbalanced, average 89 positive tags in each column from 21K entries, so a special kfold function has beed used
    from iterstrat.ml_stratifiers import MultilabelStratifiedKFold
    
  • Label Smoothing has been conducted to help improve accuracy in this multiple output case A great explanation here

  • Use Pipeline function to automatically finished column transform
    • Numerical data
      • Quantile transform
      • PCA
    • Categorical
      • One-hot-labeling
  • Build DNN model with 3 NN layer with appropriate regulation.
  • Use keras.tuner to do hyper parameters tuning and code is in Notebook

Github