We are now just two weeks away from a “stable” release of the full training set for the KiTS21 challenge on July 1st. I’m writing to provide more information about how the annotation process has progressed, and to describe some small changes that we will be making to the challenge’s scope and metrics.
The 2021 training set will tentatively be composed of all 300 cases from the KiTS19 challenge (210 training + 90 test). All of these cases were re-annotated in their entirety for KiTS21 using the new annotation procedure, defined here. We will be posting agreement metrics between KiTS19 and KiTS21 labels shortly after July 1.
The 2021 test set will tentatively be composed of at least 100 additional cases collected from a different health system, but annotated with an identical procedure to that of the training set. The imaging for these cases will remain private and will only be made accessible to participant Docker containers in a controlled environment at the time of submission.
You might have noticed that only the first 210 cases are shown on the annotation platform. Cases 210 - 299 have been hidden from view from “guest” users in order to preserve the integrity of the KiTS19 open leaderboard for as long as possible, but they will be made public shortly.
It feels like a lifetime ago that we first proposed KiTS21 way back in the fall of 2019, and unfortunately the data collection and annotation effort has not gone entirely to plan. While our team has been working hard to annotate the artery, vein, and ureter regions, we have decided that it won’t be feasible to ensure that they are of sufficient quality to include them in this challenge.
Therefore, we will have just three segmentation classes instead of six:
These will treated as three “Hierarchical Evaluation Classes” as described here. This will be updated on the website shortly.
We have not yet settled on the details, but I want to give notice that we will soon be making an update to the metrics that we plan to use for this challenge. This is because our internal experiments have shown that some of our planned metrics – especially the surface distances – are ill-behaved in cases with many small lesions. We will soon be providing more information about this here and on the website.