Share your approach?

Hello! I see that challenge is now officialy closed. I must admit I really enjoyed it! Maybe the first one in which I was at least remotely happy about my solution. But I have no idea how did you guys got all those amazing results. Very impressive, chapeu.

I do work in O’n’G’ but data science is only my hobby so I am in no position to win any prize here; I just treat it as a good motivation to learn. Since learning is easier in the group — do you care to share your solution now that challenge is closed? I’ll start with mine.

You could only submit 5 solutions per day so my first step was to write a function that would approximate Xeek result. After few attempts I found below good enough for my needs.

Once I had this function, large part of my approach was just a brute force — generating a lot of solutions for different parameters and see where does it take me. In total I attempted 1276 parameters combinations and generated almost 125k solutions. Only 24% of those solutions were not cut off by the linearity and RMSE checks.

At some point I turned from “educated guessing” best paramters to something more structured. I used database of generated solutions to train a Generalized Additive Model to understand influence of the parameters and try to predict which combination may be most favorable. Final iteration of my GAM shows folowing influence:

@wormsdacs noticed in another thread on this discord, that since you can arbitrarily choose any solution form thousands generated, you may find a one-off champion that will not necesarelly be so good in generalizing. I have however noticed in my case, that if I fliter to only those solutions that meet the linearity and RMSE criteria and then group by model parameters, mean score correlates nicely with a max score.

Best solution that I produced gave me a score of 0.1707 and placed me in about one third of a leaderboard — not great, not terrible (= 3.6 rtg :wink:). It looked like this:

And that’s it. Do you mind sharing your approach? Or maybe pointing out weaknesses of mine?

1 Like

Hi, I am late to the topic.

I was quite dissapointed at how the challenge decided to score submissions, so I quit early, but my approach was:

  1. Train a conditional CVAE by adapting Pyro’s library tutorial. This solves errors in the initial notebook provided, like the loss function being not well defined, and the architecture not well suited to the problem.

  2. Define a simple function to measure lines spread based on a standard deviation heuristic.

  3. Once the model is trained with all the data, I randomly encode dataset inputs(not patient enough to evaluate them all), to generate a latent vector z, and generate a solution to submit. I iterate on this process to maximize the spread score.

  4. For this single sample that was chosen, I run the CVAE generation process multiple times until I get a solution that maximizes the spread score again.

The first time I submited this approach, I was at the top of the leaderboard, but I noticed those metrics could be easily tricked and the rules were too restrictive to be more creative.

This was my notebook, if it serves you of reference: