Kaipo Tamura responded: |
2017-08-18 12:23 |
The pep.xml file doesn't have any obvious problems. This error usually means that the program couldn't match spectra between the pep.xml file and the mzXML file. Would you mind sharing the mzXML file as well? If you need a place to upload it, you can put it here https://skyline.ms/files.url
Thanks,
Kaipo |
|
gabe responded: |
2017-08-18 13:10 |
Thank you. I've uploaded the mzXML to the File Sharing link you provided, it's named:
20161205_THP1_DDA_longrun_01.mzXML
Thanks!
Gabe |
|
Kaipo Tamura responded: |
2017-08-21 11:42 |
Hi Gabe,
For this pep.xml file, generating a spectral library works for me. The scores are low, however, so you will need to set a cutoff score that is not restrictive (e.g. 0) in order to include the identifications in your spectral library.
For example the first identification has:
<peptideprophet_result probability="0.00014605410869949"/>
The default cutoff score in Skyline is 0.95, so if you use this default, any PSM with a score of less than 0.95 will not be included.
Thanks,
Kaipo |
|
Brendan MacLean responded: |
2017-08-21 12:09 |
What was the source of the pepXML? Was it actually run through PeptideProphet? What is the highest probability in the file?
With PeptideProphet 1.0 is the best possible probability and 0.0 is the worst. It is the probability that the individual match is actually correct. So accepting 10 peptides with probability 0.9 would be expected to include 1 false-positive on average.
If these are really PeptideProphet probabilities, I wouldn't recommend setting a cut-off of 0, as that would mean you are including spectrum matches that are almost certainly incorrect, as would be the case for the one Kaipo uses as his example.
Instead, you should reconsider how you are collecting your data and performing your peptide spectrum matching to get more spectrum matches with a higher likelihood of being correct.
Thanks for sharing your data with us.
--Brendan |
|
gabe responded: |
2017-08-22 07:55 |
Thanks Kaipo and Brendan,
It's not real PeptideProphet data -- it's a custom filtering algo that is supposed to write PeptideProphet scores. I think that's where the error must lie because when I relax the PeptideProphet probability to 0 it loads the spectra just fine. Brendan, I appreciate your comment but these peptides are all already filtered for correctness using a linear discriminant analysis similar to PeptideProphet so I'm not really concerned about false-positives here. I'll work on getting the faux PeptideProphet scores to be correct (mine seem to be correct close to 0 and false close to 1... did older PeptideProphet versions use a different convention?) but for now I've got Skyline doing what I need.
Thank you!
Gabe |
|
Brendan MacLean responded: |
2017-08-22 08:54 |
Hi Gabe,
PeptideProphet has always used 1.0 as 100% probability of correctness since I started using it in 2003. Most other tools, however, use the inverse where 0 means zero probability of the result happening by random chance.
So, while your scores make sense in many cases, they are completely the inverse of what is expected of PeptideProphet probabilities, and you should inform whoever wrote the code to produce your pepXML that they should at least replace their current values with 1-current to best achieve an impersonation of PeptideProphet.
It is fine to use 0 cut-off if your results are pre-filtered to some known level of confidence, but it would also be good to fix your pipeline to produce pepXML more consistent with the expectations of all other tools.
Thanks for clarifying. Glad the "take everything" hack we introduced with cut-off = 0 helps to unblock you.
--Brendan |
|
gabe responded: |
2017-08-31 04:56 |
Hi Brendan,
I'm using the pep.XML files described above (with the bad inverted PeptideProphet scores) to generate .blib files in Skyline. If I combine multiple blib files into a single library, will Skyline know how to chose the *best* PSM for a given peptide? or will it get misled by those PeptideProphet scores? I.e. do the peptideProphet scores get used for assessing PSM quality after the pep.XML import? or do other scores (XCorr, intensity, etc) get used?
Thanks
Gabe |
|
gabe responded: |
2017-08-31 05:21 |
Looking more-closely, I see that the peptideProphet scores are stored in the .blib files (score-type = 2), so I'm guessing I'll need to invert these before combining multiple .blib files. Can you confirm? I'm a bit confused about how blib files are combined since the blibfilter documentation says something about best average pairwise dotproducts being used to chose the best spectrum. Are the .blib scores used by blibfilter?
Thanks
Gabe |
|
Kaipo Tamura responded: |
2017-08-31 08:11 |
Hi Gabe,
Ordinarily BlibFilter uses dot products to pick the best spectrum (i.e. when it is called with no command line options), but Skyline passes a flag to it that enables a different method of picking the best spectrum, which is simply to take the one with the best score. In the case of a tie, the one with higher TIC is chosen.
Thanks,
Kaipo |
|
clichti responded: |
2017-08-31 09:24 |
Which means the answer to your question is it was probably not a good idea in the first place to continue forward with an inverted score, even when including all matching spectra, as the score still has significance in choosing the best spectrum. Even in your first library build, peptides with more than one matching spectrum would see the highest score chosen, and in your case that is expected to be the worse matching spectrum, because the scores are reversed.
So, yes, you can continue pushing forward against the software design and expectations of your score, but you are far better off returning to the source of the inverted scores and getting them to correct direction, or write your own conversion program that simply flips the scores to 1-probability to get them in the direction Skyline (and everyone else) expects from PeptideProphet, or work with us to come up with some new identifier of your workflow that allows us to correctly identify your scores purporting to come from PeptideProphet as actually coming from a scoring pipeline that produces an inverted score.
I strongly recommend against continuing forward with inverted scores as a long-term solution. Maybe it was good enough to get you by the first hurdle, but it will not be an acceptable solution longer-term.
Thanks for giving it some deeper thought and pointing out this issue. You were entirely correct.
--Brendan |
|
gabe responded: |
2017-08-31 09:28 |
Thanks Brendan and Kaipo,
Agreed. In fact, I've spent the morning figuring out how to generate the correct (inverted) scores in our pep.xml writer and am currently in the process of regenerating all my .blib files.
Appreciate the detailed responses and advice
Best
Gabe |
|