Starter Notebook - Creating a Class for New Data

Hi all, I am trying to understand the steps from “Save model and scaler to files” onwards and setting up a class for running on new data, but it is not clear to me how it works. Are there any other reference material for how to build a class to work with new data?

In the step that has code for class Model(object) , should this class take in a Dataframe, do the pre-processing, followed by the prediction(s)/Machine Learning models and then “spit out” the prediction result as a submittable file?

If so, I am not sure where the scaler.pkl and model.pkl come into that workflow, as it looks like they are being initialised/loaded within the init method and the two files are generated before the class.

Also, I assume that the final submission is a csv file, if so, I guess I can set my model/code up in the way that I want?


As stated in the rules the final submission is code and a model, not a CSV file. The top 10 teams in the leaderboard at the end of the competition are invited to submit models for final scoring on the hidden test data set.

The Model class of the starter notebook is just an example on how such a submission may look like. In order for the organizers to be able to run a prediction using your model, the code must contain all feature selection, pre-processing, prediction and any post-processing. It should take a dataframe in the form of the test data set in, and give predictions out. It must load any persisted data (such as neural network weights) from files provided, hence the save model and scaler steps.