spectral library from pep.XML and mzXML

gabe

2017-08-18 11:18

Hi, I'm trying to import pep.XML/mzXML data to generate a spectral library and I'm getting this error:

System.IO.IOException: ERROR: No spectra were found for the new library.

   at pwiz.Common.SystemUtil.ProcessRunner.Run(ProcessStartInfo psi, String stdin, IProgressMonitor progress, IProgressStatus& status, TextWriter writer) in c:\proj\skyline_3_7_x64\pwiz_tools\Shared\Common\SystemUtil\ProcessRunner.cs:line 59
   at pwiz.BiblioSpec.BlibBuild.BuildLibrary(LibraryBuildAction libraryBuildAction, IProgressMonitor progressMonitor, IProgressStatus& status, String[]& ambiguous) in c:\proj\skyline_3_7_x64\pwiz_tools\Shared\BiblioSpec\BlibBuild.cs:line 171
   at pwiz.Skyline.Model.Lib.BiblioSpecLiteBuilder.BuildLibrary(IProgressMonitor progress) in c:\proj\skyline_3_7_x64\pwiz_tools\Skyline\Model\Lib\BiblioSpecLiteBuilder.cs:line 122

A representative pep.XML file is attached (containing only a few spectra). I could figure-out how to share one of my mzXML files, too, if it would be helpful. I'm just not sure what this error means and why it can't find the spectra since the pep.XML and mzXML files are in the same folder. Does anything appear to be wrong with the pep.XML file? Thanks

320_THP1_small.pep.xml

Kaipo Tamura responded:	2017-08-18 12:23
The pep.xml file doesn't have any obvious problems. This error usually means that the program couldn't match spectra between the pep.xml file and the mzXML file. Would you mind sharing the mzXML file as well? If you need a place to upload it, you can put it here https://skyline.ms/files.url Thanks, Kaipo

gabe responded:	2017-08-18 13:10
Thank you. I've uploaded the mzXML to the File Sharing link you provided, it's named: 20161205_THP1_DDA_longrun_01.mzXML Thanks! Gabe

Kaipo Tamura responded:	2017-08-21 11:42
Hi Gabe, For this pep.xml file, generating a spectral library works for me. The scores are low, however, so you will need to set a cutoff score that is not restrictive (e.g. 0) in order to include the identifications in your spectral library. For example the first identification has: <peptideprophet_result probability="0.00014605410869949"/> The default cutoff score in Skyline is 0.95, so if you use this default, any PSM with a score of less than 0.95 will not be included. Thanks, Kaipo

Brendan MacLean responded:	2017-08-21 12:09
What was the source of the pepXML? Was it actually run through PeptideProphet? What is the highest probability in the file? With PeptideProphet 1.0 is the best possible probability and 0.0 is the worst. It is the probability that the individual match is actually correct. So accepting 10 peptides with probability 0.9 would be expected to include 1 false-positive on average. If these are really PeptideProphet probabilities, I wouldn't recommend setting a cut-off of 0, as that would mean you are including spectrum matches that are almost certainly incorrect, as would be the case for the one Kaipo uses as his example. Instead, you should reconsider how you are collecting your data and performing your peptide spectrum matching to get more spectrum matches with a higher likelihood of being correct. Thanks for sharing your data with us. --Brendan

gabe responded:	2017-08-22 07:55
Thanks Kaipo and Brendan, It's not real PeptideProphet data -- it's a custom filtering algo that is supposed to write PeptideProphet scores. I think that's where the error must lie because when I relax the PeptideProphet probability to 0 it loads the spectra just fine. Brendan, I appreciate your comment but these peptides are all already filtered for correctness using a linear discriminant analysis similar to PeptideProphet so I'm not really concerned about false-positives here. I'll work on getting the faux PeptideProphet scores to be correct (mine seem to be correct close to 0 and false close to 1... did older PeptideProphet versions use a different convention?) but for now I've got Skyline doing what I need. Thank you! Gabe

Brendan MacLean responded:	2017-08-22 08:54
Hi Gabe, PeptideProphet has always used 1.0 as 100% probability of correctness since I started using it in 2003. Most other tools, however, use the inverse where 0 means zero probability of the result happening by random chance. So, while your scores make sense in many cases, they are completely the inverse of what is expected of PeptideProphet probabilities, and you should inform whoever wrote the code to produce your pepXML that they should at least replace their current values with 1-current to best achieve an impersonation of PeptideProphet. It is fine to use 0 cut-off if your results are pre-filtered to some known level of confidence, but it would also be good to fix your pipeline to produce pepXML more consistent with the expectations of all other tools. Thanks for clarifying. Glad the "take everything" hack we introduced with cut-off = 0 helps to unblock you. --Brendan

gabe responded:	2017-08-31 04:56
Hi Brendan, I'm using the pep.XML files described above (with the bad inverted PeptideProphet scores) to generate .blib files in Skyline. If I combine multiple blib files into a single library, will Skyline know how to chose the best PSM for a given peptide? or will it get misled by those PeptideProphet scores? I.e. do the peptideProphet scores get used for assessing PSM quality after the pep.XML import? or do other scores (XCorr, intensity, etc) get used? Thanks Gabe

gabe responded:	2017-08-31 05:21
Looking more-closely, I see that the peptideProphet scores are stored in the .blib files (score-type = 2), so I'm guessing I'll need to invert these before combining multiple .blib files. Can you confirm? I'm a bit confused about how blib files are combined since the blibfilter documentation says something about best average pairwise dotproducts being used to chose the best spectrum. Are the .blib scores used by blibfilter? Thanks Gabe

Kaipo Tamura responded:	2017-08-31 08:11
Hi Gabe, Ordinarily BlibFilter uses dot products to pick the best spectrum (i.e. when it is called with no command line options), but Skyline passes a flag to it that enables a different method of picking the best spectrum, which is simply to take the one with the best score. In the case of a tie, the one with higher TIC is chosen. Thanks, Kaipo

clichti responded:	2017-08-31 09:24
Which means the answer to your question is it was probably not a good idea in the first place to continue forward with an inverted score, even when including all matching spectra, as the score still has significance in choosing the best spectrum. Even in your first library build, peptides with more than one matching spectrum would see the highest score chosen, and in your case that is expected to be the worse matching spectrum, because the scores are reversed. So, yes, you can continue pushing forward against the software design and expectations of your score, but you are far better off returning to the source of the inverted scores and getting them to correct direction, or write your own conversion program that simply flips the scores to 1-probability to get them in the direction Skyline (and everyone else) expects from PeptideProphet, or work with us to come up with some new identifier of your workflow that allows us to correctly identify your scores purporting to come from PeptideProphet as actually coming from a scoring pipeline that produces an inverted score. I strongly recommend against continuing forward with inverted scores as a long-term solution. Maybe it was good enough to get you by the first hurdle, but it will not be an acceptable solution longer-term. Thanks for giving it some deeper thought and pointing out this issue. You were entirely correct. --Brendan

gabe responded:	2017-08-31 09:28
Thanks Brendan and Kaipo, Agreed. In fact, I've spent the morning figuring out how to generate the correct (inverted) scores in our pep.xml writer and am currently in the process of regenerating all my .blib files. Appreciate the detailed responses and advice Best Gabe