Evaluation metric code


I would like to suggest organizer to share the python script for dice metric evaluation similar to the past challenges, like Segmentation Decathlon, ACDC and etc. This would definitely reduce the evaluation error for both sides – participants and organizer.


1 Like

Thanks for the suggestion. We were planning to release that with the test data but we can certainly do it sooner. I’ll try to get around to it this week.

Hi @neheller,

Would you please let us know, when you release the evaluation metric code?

Will do, apologies for the delay

You can find the evaluate function in starter_code/evaluation.py. The meat of it is below:

# Compute tumor+kidney Dice
tk_pd = np.greater(predictions, 0)
tk_gt = np.greater(gt, 0)
tk_dice = 2*np.logical_and(tk_pd, tk_gt).sum()/(
    tk_pd.sum() + tk_gt.sum()
# Compute tumor Dice
tu_pd = np.greater(predictions, 1)
tu_gt = np.greater(gt, 1)
tu_dice = 2*np.logical_and(tu_pd, tu_gt).sum()/(
    tu_pd.sum() + tu_gt.sum()

Please let me know if you have any questions or concerns.

Hi, I think there is a slight problem about the evaluation metrics.

It should be 2.0 * np.logical… instead of 2 * np.logical…

The dice should be float instead of integers.

Thanks for pointing this out. I should have mentioned that this is meant to be run in Python 3, where all division returns floats. I haven’t tested it in Python 2, but you may very well be correct that this causes a problem. Luckily the evaluation platform on grand-challenge.org runs Python 3.6.