FDR for Skyline Integrated MSAmanda Searches

support
FDR for Skyline Integrated MSAmanda Searches jorge peinado  2021-10-25
 

Dear Skyline team,

I would like to know more about how the Skyline cut-off score and the MSAmanda integrated search tool interact.

I have been using Maxquant as my main search tool when working with Thermo Orbitrap raw data.
When reading some previous responses in this forum about how Maxquant FDR (PEP based) and Skyline cut-off interact i thought that a 0.99 Skyline cut-off would be the equivalent to Maxquant 1% FDR. However, i found out that when using a 0.99 Skyline cut-off i was missing some peptides identified in my 1% FDR Maxquant search.
After testing different Skyline cut-offs i found out that 0.9 cut-off is the cut-off that allows me to import all my results from Maxquant without additional cut-off from Skyline.
I don´t understand why 0.9 is the Skyline equivalent to Maxquant 1% FDR but i was happy with it as i trust Maxquant FDR.

However, i have recently received some data from a Waters Synapt which i haven´t been able to analyse via Maxquant.
I have found that performing my DDA peptide search using Skyline integrated MSAmanda is an alternative pipeline that could work for me, but i´m doubtful about what cut-off value would be the most appropiate to use as my experience suggest that 0.9 cut-off would be the equivalent to 1%FDR while logic makes me think that 0.99 cut-off should be the equivalent to 1%FDR.
I´m aware that MSAmanda uses a different scoring system than Maxquant, so i´m not sure about the cut-off value to use.

Thanks in advance

 
 
Nick Shulman responded:  2021-10-25
The msms.txt file with the MaxQuant search results is a tab-separated text file that you can look at in Microsoft Excel.
When you choose a score cutoff of 0.99, BiblioSpec will not include rows where the value in the "PEP" column is greater than 0.01.

In theory, the FDR and PEP are quite different numbers. There is an explanation of the difference on page 2 of this document from the Noble lab:
https://noble.gs.washington.edu/papers/kall2008posterior.pdf

When BiblioSpec is looking through your MaxQuant search results, BiblioSpec really is using the PEP number that MaxQuant provides. I am not sure whether the difference that you are seeing has something to do with the difference between FDR and PEP. It might be that there is a different set of numbers that you could be looking at in MaxQuant.

When you do an MSAmanda search in Skyline, I know that Percolator is used to assign the probabilities. I do not know whether those probabilities are FDR's or PEP's, but probably someone else on this support board can give you a better answer.

Can you send us an example of some MaxQuant results where the 0.99 cutoff was causing BiblioSpec to reject some matches that you did not think should be rejected? We might be able to figure out what is going on when we see your peptide search results. We do not have very much expertise with actually using MaxQuant, but some of the people on our team understand the file formats very well.
Files which are less than 50MB can be attached to this support request. You can upload larger files here:
https://skyline.ms/files.url

-- Nick
 
jorge peinado responded:  2021-10-26
Thanks for your response Nick.

Ok, so i think i understand better what Skyline is doing with Maxquant data.
Maxquant uses the PEPs from all the PSMs to calculate a FDR and based on that it just include on its output table those peptides which PEP fall into the calculated range. Some of those can be grater than 0.05 (for example) depending on the dataset.
What Skyline picks is just the PEP values and applies its cut-off on those values. So a peptide with a PEP 0.05 can appear on a Maxquant output table (1% FDR) but would be deleted by Skyline if the cut-off score is above 0.95.
Does it work like that? (I hope i have explained myself properly)

Attached is the Maxquant msms table that i was using to test the cut-off impact.
With a Skyline 0.99 cut-off i was getting loaded into Skyline 19.420 pept. 0.95 cut-off: 25.321 pept. 0.9/0.5/0 cut-off: 26.237 pept.

On the other hand, MSAmanda seems to be based in q-values.
The previously mentioned paper define q-values as: "a q-value of 0.01 for peptide EAMRQPK matching spectrum s means that, if we try all
possible FDR thresholds, then 1% is the minimal FDR threshold at which the PSM of EAMRQPK to s will appear in the output list."

So effectively setting a 0.99 Skyline cut-off score for MSAmanda is actually setting the FDR to 1% since only the peptides with a 0.01 or lower q-value will appear in the Skyline spectral library.

In summary, i should be using a 0.9 Skyline cut-off for Maxquant data to ensure that all the peptides inside the 1% FDR range appear in Skyline while for MSAmanda, setting the Skyline cut-off to 0.99 is what actually will cause that the only peptides to appear are the ones inside the 1% FDR.
 
Brendan MacLean responded:  2021-10-26
Hi Jorge,
What you describe is correct, I believe. If you feel confident that the table you get from MaxQuant is already filtered to your desired FDR (which it certainly can be), then you are welcome to use 0 as library cutoff value (instead of 0.90 - a PEP of 0.1). In my own experience, however, when FDR is arranged in a way the very low PEP values are included in 1% FDR, then you are essentially adding a lot of poor matches to reach 1% and you would still be better off using a PEP cutoff, like 0.1 which means that 1 of ever 10 matches is likely incorrect. I have seen datasets where the tools indicated that 1% FDR occurred at 0.8 PEP, or when 8 of 10 matches are likely incorrect. To me, it seemed absurd that one should then actually accept a PEP cutoff of 0.8, simply because that is where 1% FDR occurred.

The logical extreme is supposing that you have a perfect discriminator so that your PEP goes from 0 to 1 at a single match. Would you then want to continue adding entirely false matches until you reached 1% FDR?

That is to say, I agree with using some kind of cutoff, be that 0.9 (0.1 PEP) or 0.95 (0.05 PEP) even on MaxQuant results that are previously filtered for 1% FDR. I just don't think it makes sense to penalize a stronger discriminant score by backfilling with false positives to reach 1% FDR.

Thanks for your careful consideration of this topic.

--Brendan