As I mentioned a few weeks ago, we have decided to change the metrics we’re using due to some instability that we observed in preliminary experiments. The new metric details have been posted on the website (here) and an implementation can be found on GitHub at /kits21/evaluation/
.