The team has recommended to use np.random.normal to create a normal distribution of each variable for each family for at least 100 samples. However, the normal distribution function also yields negative values which are not possible for a geochemical variable. Now the question that arises is what to do with those negative values. Do we replace it with zero, mean, median, or mode?
piece of a hint: Visualizing a box plot can be a great tool for making such a decision.