That is pretty simple, actually. The contributions are calculated against the current data set. So, if you change the data set, they will change, just not the weights themselves. The contributions are informational and not included in any further calculations involved in the peak scoring.
Slightly more subtle is the normal distibution (yellow density curve behind the bars) that is also used in scoring. The difference is pretty slight here, but you may feel, if you look closely that it is biased slightly right (i.e. mean is to the right) of the yellow bars in your second data set. That is probably not just an optical illusion. The distribution is scaled to unit normal (Mean = 0 and SD = 1) for the trained coefficients and the training decoys. The height is scaled to the current set of decoys to preserve the visual display, but the mean is not re-centered on them.
In other words, the important scoring parameters are stored and used based on your training, but some of the information in the plots comes from the current data set.
And this gives you some opportunity to assess whether your model trained on simpler data makes sense to apply to your more complex data. In this case, it doesn't look so bad, but ideally you want to use a model trained on data as close as possible in nature to the data you then score with it. Something to consider.
I don't have a great opinion on best numbers of decoys for mProphet on MRM. The original mProphet paper made suggestions. I don't really know of any further analysis of this question since then. Most mProphet use these days is focused on DIA where decoys are plentiful, often in the 10,000+ peptides range, which may explain why the question of a minimum became less interesting.
Thanks for posting your question and example images to the Skyline support board. Hope this information helps.
--Brendan |