@Jianhong_Cheng I typically process submissions three times per day – morning, late afternoon, and late evening in central daylight time. If I processed submissions as they came in at night I would never get any sleep
@koncle this is a great point, and I agree with @FabianIsensee that assigning a Dice of 0 in these cases is not desired behavior. However, for this cohort, having a renal tumor is part of the inclusion criteria (as stated in our manuscript) so the situation will never arise.
We debated about including some healthy controls in both the training and test sets, but we figured that since most patients have a healthy kidney contralaterally, the models would have to be able to recognize those situations regardless. Perhaps we’ll revisit this for next year.
This is really a question of study precision and recall vs world PPV and NPV, or at least some analog of those for segmentation. Since tumor prevalence is so much higher in the study population than the real world, you would expect a world PPV << study precision. If you were ever trying to apply such a model in the clinic, you would need to account for this either by making your clinical population more similar to your study population (e.g. only run the model for patients suspected of having kidney a tumor) and/or by incentivizing high-precision models by optimizing for something like a Tversky index.