Calculate the confidence score and q-value for searched peptides sunbergsoon  2019-04-30 03:42
 

Hi,

I am doing SWATH analysis using Skyline. We filtered the searched peptides based on dot-product which above 0.8.
Recently we also tried the mProphet model and using the second best peak to calculate the q-value to do more confident
filtering. However, the results are not so good and the Q-q plot shows a bad shape. So how can I use Skyline to do more
confident filtering on SWATH analysis? Can I try other method like using openSWATH model?

Best
Sunberg

 
 
Brendan MacLean responded:  2019-04-30 09:16

The Skyline mProphet model is very similar to the OpenSWATH model, as shown in the Navarro 2016, Nature Biotech paper:

https://www.ncbi.nlm.nih.gov/pubmed/27701404

However, you may want to try using decoys rather than second-best peaks, which has received far less validation than using decoys.

What is the basis of your Q-Q plot? How are you determining the true false-discovery rate?

 
sunbergsoon responded:  2019-04-30 12:15

Hi Brendan,

Thank you for your help!
The QQ plot was generated using second-best peaks, the false-discovery rate is based on the
distribution of the second-best peaks and the best peaks.

I also tried the decoy method, however, after I created the decoy peptides and reintegrated
for next model training, it said: "No decoy peptides". So how can I perform this method?

Best
Sunberg

 
Brendan MacLean responded:  2019-04-30 12:37

You need to re-import your mass spec files to have Skyline extract chromatograms for the decoys and then run your model. You might want to review one of the webinars on this:

https://skyline.ms/webinar14.url
https://skyline.ms/webinar15.url

I still don't quite get where the two axes come from in your Q-Q plot. One of them, I assume, comes from the reported q values from Skyline, but where does the other come from?

 
sunbergsoon responded:  2019-05-03 13:31

Hi Brendan,

The Q-Q plot I cut from the reported q-values, just as the attached file.
However, most of Q-Q plots are just like this which the green line is not so significant.

Thank you very much for your kind help!

Best
Sunberg

 
Brendan MacLean responded:  2019-05-03 14:17

Hi Sunberg,
You appear to be using Edit > Refine > Compare Peak Scoring, which is intended for use with datasets which have been manually curated with all integrated peaks in the document considered "True-Positives". Then you compare some other way of scoring and picking the peaks with this ground truth. The "Observed False-Positive Rate" comes from comparing the alternate scoring method with the document as it is currently integrated. Anything that would pick something other than what is in the document is considered a false-positive.

So, I guess, the question remains: where do the two cases you are comparing come from:

  1. The existing integration boundaries
  2. The alternately scored integration boundaries

For, instance, if #1 is simply the default peak picking in Skyline and #2 is a trained mProphet model, then your Q-Q plot is not valid and you are not looking at useful information, because every "false-positive" is simply a case when the mProphet model would change the integration from the default peak picking.

Hope this helps to clarify. The Peak Scoring Comparison UI was really intended for researchers working on implementing these scoring models, like the ones that contributed to the Navarro, Nature BioTech 2016 paper:

https://www.ncbi.nlm.nih.gov/pubmed/27701404

Thanks for clarifying the source of your Q-Q plots.

--Brendan

 
Brendan MacLean responded:  2019-05-03 14:22

I should note that this strategy of comparing against a manually curated data set was used in these two publications on OpenSWATH:

https://www.ncbi.nlm.nih.gov/pubmed/24727770
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008461/

The Compare Peak Scoring interface was an attempt to improve the process of applying this strategy and increase visibility into why multiple algorithms such as OpenSWATH, Skyline, and Spectronaut might differ in their peak picking choices.

 
sunbergsoon responded:  2019-05-06 02:34

Hi Brendan,

Thank you so much for your help and I really appreciate it!

Just as you said, the two things I compared was one by Skyline and the other was by the mProphet model, which was
actually not my want. What I am doing is just using statistic method or model to have a test on my identified
proteins and peptides, and have a confidence score on the searched results by Skyline. What I need to do is just
exporting the report after applying the model and to have the q_value for the proteins and peptides.

Thanks a lot!

Best wishes,
Sunberg