Advanced peak picking model issue: the scoring is not correct for internal standard mixed sample

support
Advanced peak picking model issue: the scoring is not correct for internal standard mixed sample lxiiaanog  2025-06-16 00:20
 

Dear Skyline team,

I am trying to use mProphet peak picking to facilitate identification, but encountered a problem as described below.

I have four synthesized peptides that are unlabeled (light), and over 500 synthesized peptides that are heavy lysine encoded (heavy) and cover the sequences of the four light peptides. Then I mix them together for DIA acquisition.

In Skyline, I use the heavy peptides as internal standard, and use the Reintegrate function to apply mProphet for peak picking. I expect to see that only the four overlapped peptides are identified (detection q value < 0.01). However, almost all peptides have q value<0.01, meaning that almost all of them are identified, which is incorrect.

I see that in the light and heavy peptides share the same q values. So I removed the heavy peptides in the file and re-did Reintegrate. This time only around 20 peptides have q value<0.01, showing that the incorrect hits are significantly decreased.

Does it mean that the heavy peptides are not separetly regarded as internal standards but instead scored together? Could you teach me how to circumvent this issue to enable a more reasonable re-integration?

I attached the related file in the File Sharing Folder with name: mProphet-Issue20250616.sky.zip. Please let me know if I did not clarify the question.

Many thanks!
Li

 
 
Nick Shulman responded:  2025-06-16 01:05
If your samples contain internal standards then you cannot use an mProphet scoring model to determine whether endogenous (i.e. light) peptides are present.
The trained mProphet scoring model essentially looks for the internal standard, and then assumes that the light peptide is present at the same retention time as the internal standard.
Skyline assumes that the internal standard will be so much easier to find compared to the light standard.
If you ever have a sample where the internal standard is not that easy to detect then you should go to the "Modifications" tab at "Settings > Peptide Settings" and change what is selected in the "Internal Standard Type" drop down list.

Most of the mProphet Features involve only looking at chromatograms of the internal standards. There are a few features with "Reference" in their name (like "Reference co-elution count") which involve looking at both the heavy and light chromatograms at the same time.

The way that you should be deciding whether the light peptide is present in your samples is by looking at the chromatogram peak area of the light peptides.

If you want to compare quantities between groups of replicates, you should look at the Group Comparison tutorial:
https://skyline.ms/wiki/home/software/Skyline/page.view?name=tutorial_grouped

When you have internal standards, it usually is not necessary to train an mProphet peak scoring model because the internal standards are usually easy enough to find that Skyline picks the correct peak.
-- Nick
 
lxiiaanog responded:  2025-06-17 00:00
Hi Nick, Thanks for the thorough explanation! Helped a lot.

Li