Incomplete spectra

Incomplete spectra user  2018-09-30

Dear Skyline team,

I have encountered incomplete spectra (top left in the attached graph) several times, though they occupy a very small portion. I want to know the possible reasons. For a high-quality peptide to train mProphet model, if only 1 of the 20 replicates has such incomplete spectra and I want to use the other 19 spectra, should I just unselect any spectra in this replicate (leave blank for this replicate, as top right in the attached graph)? The bottom spectra is an example of normal spectra in the other 19 replicates.


Nick Shulman responded:  2018-09-30

The horizontal lines in the chromatograms happen when there are no MS2 spectra which match your precursor over a period of time in the chromatography. This is expected to happen with DDA experiments, and for this reason, chromatograms extracted from DDA experiments are not useful.

It looks like you have a scheduled PRM experiment, and the machine did not start collecting MS2 spectra for that precursor until a little after 25 minutes. I think both of the chromatograms in your screenshot have those long horizontal lines, but in the lower panel the first MS2 spectrum had lower intensities, so when Skyline extended the lines horizontally, the lines ended up being closer to the X-axis so you can't really see them.

I would have to see your files to know exactly what is going on. Skyline tries to truncate the chromatograms in PRM experiments so that the chromatogram does not extend beyond the range where there were matching MS2 scans. I am not sure why that did not happen here. It might have something to do with other precursors in the same peptide.

If you would like, you can send us your files.
In Skyline, you can use the menu item:
File > Share > (complete)
to create a .zip file containing your Skyline document and supporting files including extracted chromatograms.

You can upload that .zip file here:
It would also be helpful to see a few of your .raw files.

I am not sure I understand your question about mProphet. I believe there is no way to tell Skyline to skip over a particular replicate for a particular peptide. Skyline looks at all of the possible peaks for all of your peptides and tries to come up with the best weighting of features in order to get the best separation between your targets and decoys.

Hope this helps,
-- Nick
user responded:  2018-10-01
Dear Nick,

Thanks for your quick explanation.
Yes, they are scheduled PRM data and the target “AVLTIDEK” is not included in the inclusion list. I have uploaded the .zip file of my Skyline document. You may have a look.

I’d like to use some high-quality spectra and their second-best peaks to train a mProphet model for this dataset. I want to know if the quality of the mProphet model is affected by a replicate without a peak group selected (cause no good ones are there).
What will Skyline do if there is no peak group selected in a replicate, still automatically find a best peak and second best peak for this replicate before training mProphet (As you mentioned “Skyline to skip over a particular replicate for a particular peptide”)? So in this case, I need to delete this particular peptide from all replicates to train a good model?

Nick Shulman responded:  2018-10-01
I am not sure I understand your question.

mProphet looks at extracted chromatograms, not spectra.

Also, mProphet completely ignores whatever peaks you may have manually chosen. That is, even if you have explicitly chosen a particular peak, mProphet ignores that, and looks at all of the candidate peaks along the chromatogram, scores then individually, and decides which is the highest scoring peak.

In general, you should not remove peptides from the training set based on what you see in the data. The purpose of training a model is to determine what weighting of features is best able to distinguish targets from decoys. However, if you remove the targets that have a certain look to them (for instance, remove all peptides whose area is too small) then you will get a model which does not work on real world data, even though it appears to achieve good separation between targets and decoys.

By the way, if you are using "second best peaks" to train your model, make sure that you disable the features related to "retention time difference". Since only one peak can be close to the predicted retention time, if you include that score in the model, it does not work right. Brendan mentions that on this support request:
-- Nick
user responded:  2018-10-02
Dear Nick,

Thanks for your detailed explanation and it is much clearer to me about how mProphet is trained in Skyline.

However, I still do not understand why Skyline does not use manually corrected extracted chromatograms to train mProphet models. For example, if a low-quality dataset has 25% or more incorrectly picked extracted chromatograms by the Skyline default algorithm, can Skyline get a good mProphet model in this case? Do we have to use gold-standard data set to get a good model in this case? Can Skyline get a better model if accept manually corrected peaks?

Brendan MacLean responded:  2018-10-03
Hi Antony,
We can certainly think of improvements we might make in mProphet modeling and scoring to incorporate manual changes. But, we have a list of improvements we could make in that area, and this just hasn't made it to the top of that list. It is not that we can't imagine this better, but it takes resources to do so.

Nick has explained the state of the system you are using, not necessarily the state as we wish it might be.

Thanks for your feedback. We will consider it a vote for doing this work in the future.