The rules mention that our model will be tested on a holdout portion of the data. Is this holdout portion just a portion of the testing dataset that was provided?
No, there is additional data that has not been released that we will be running all submitted models against.
Okay so that means that scores we see on the leaderboard are just a general indication of how the models are performing. The final score (it’s “performance part”) could be totally different. Is that right?
That is correct. The final performance ratings will almost certainly differ from the leaderboard as some models are likely overfitting to the current dataset.
If our models will be tested on additional data, should we save the models in pickle format and write a code to run the models on new data or you will retrain models using our code from jupyter notebooks?
A Jupyter notebook is sufficient for submission (along with a requirements file and a short explainer video).