(*) Reducing Poisson error can offset classification error: a technique to meet clinical performance requirements. ML4H, 2024.
ML algorithms that examine tissue (e.g. malaria slides) are typically evaluated based on their object-level accuracy vs. that of skilled clinicians. However, this metric does not capture the realities and requirements of the actual clinical task. Humans, even with perfect object-level accuracy, are subject to nontrivial error from the Poisson statistics of rare events, because clinical protocols often specify a remarkably small sample size due to the exigencies of clinical work. In contrast, an ML system may be less accurate on an object detection level, but it may also be able to examine a much bigger sample, lowering its Poisson error. Clinical performance depends on a combination of these two types of error. This paper analyses the mathematics of the relationship between Poisson error, classification error, and total error. This mathematical approach enables teams (software and hardware) optimizing ML systems to leverage a relative strength (larger sample sizes) to offset a relative weakness (classification accuracy). The methods are illustrated on two concrete examples: diagnosis and quantitation of malaria on blood films.
A version is available online at arXiv
A local copy is here.