What CMPL_FAC_ID are being predicted on?

Can we assume anything about the CMPL_FAC_ID that the models are going to be tested on?

For example: can we expect to see the same CMPL_FAC_ID as the training set?

I actually checked this. The identifiers in the test dataset are all different from the ones in the train dataset.
This was the first time I used the ‘set’ feature of python. Formed two sets, their intersection was empty.

In another post Xeek claimed there was a dataset that hasn’t been publicly released which will be tested against the models we created.

That is correct. There is an entirely new dataset we will be running models against for final judging.

