A key task in ML is to optimize models at various stages, e.g. by choosing hyperparameters or picking a stopping point. A traditional ML approach is to apply the training loss function on a validation set to guide these optimizations. However, ML for healthcare has a distinct goal from traditional ML: Models must perform well relative to specific clinical requirements, vs. relative to the loss function used for training. In this paper we describe two controlled experiments which show how the use of clinically-relevant metrics provide superior model optimization compared to validation loss, in the sense of better performance on the clinical task.

A local copy is here.