mProphet models, % contribution and min. number of groups

support
mProphet models, % contribution and min. number of groups mvillal1  2017-10-03 15:06
 
Hello Skyline team, I hope you are doing fine.

I am using different datasets to train three MRM mProphet models that I then apply to some other MRM raw files from samples with more complex backgrounds and get q-values of the peak groups that were detected. However, after I train a model and then use it to re-integrate the results from the other more complex samples, I am noticing that the % contributions of the same model change, before and after I re-integrate the results... Why is this happening? Is the scoring changing? I am attaching a document with one of these cases.

Finally, according to your experience, do you suggest a minimum number of decoys/target transition group records for training (or using) mProphet models?

Many thanks,
Ivan.
 
 
Brendan MacLean responded:  2017-10-03 20:57
That is pretty simple, actually. The contributions are calculated against the current data set. So, if you change the data set, they will change, just not the weights themselves. The contributions are informational and not included in any further calculations involved in the peak scoring.

Slightly more subtle is the normal distibution (yellow density curve behind the bars) that is also used in scoring. The difference is pretty slight here, but you may feel, if you look closely that it is biased slightly right (i.e. mean is to the right) of the yellow bars in your second data set. That is probably not just an optical illusion. The distribution is scaled to unit normal (Mean = 0 and SD = 1) for the trained coefficients and the training decoys. The height is scaled to the current set of decoys to preserve the visual display, but the mean is not re-centered on them.

In other words, the important scoring parameters are stored and used based on your training, but some of the information in the plots comes from the current data set.

And this gives you some opportunity to assess whether your model trained on simpler data makes sense to apply to your more complex data. In this case, it doesn't look so bad, but ideally you want to use a model trained on data as close as possible in nature to the data you then score with it. Something to consider.

I don't have a great opinion on best numbers of decoys for mProphet on MRM. The original mProphet paper made suggestions. I don't really know of any further analysis of this question since then. Most mProphet use these days is focused on DIA where decoys are plentiful, often in the 10,000+ peptides range, which may explain why the question of a minimum became less interesting.

Thanks for posting your question and example images to the Skyline support board. Hope this information helps.

--Brendan