PRM FDR control

jfoe

2018-12-13 08:26

Dear skyline team,

I have been experimenting with the advanced peak picking model based on mProphet.
I have an assay with heavy internal reference and I would like to have q values for each peptide.

Now due to our quite tight scheduling for the PRM acquisition I can't imagine for scoring based on second best peaks to be of use.
When generating decoys however, they get +10 precursor mass so the would need another window during acquisition.

Do you think a workflow is possible, where one uses decoy transitions with the PRM data as is?
I think FDR in PRM is an important topic and I would like to put some effort into making this work.

Cheers,
Jonas

Brendan MacLean responded:	2018-12-20 11:58
Hi Jonas, I am not sure I fully trust mProphet scoring with super-tight scheduling anyway. I would probably start by proving to myself that I would get relatively similar results between tight scheduling and not. We a test with DIA and very tight scheduling and IDs below q value 0.01 dropped steadily as we widened the window. We decided to go with the wider window anyway, because we felt uncertain the detections with super narrow windows were valid. Essentially, you are saying, I know what I am going to find right here and there it is. With a super tight scheduling window you are less likely to even get an untruncated peak in your scheduling range. I guess we could consider what you suggest, but we'd need to exclude precursor-based scores from the models because without a shift, you are generally going to have exactly the same precursor peak. Not sure when I could promise to enable this. Probably not the answer you were hoping for, but we do have a lot going on. Sorry. --Brendan

jfoe responded:	2018-12-21 02:47
Dear Brendan, thank you very much for your response. In fact you can already edit any skyline file to set decoy mass shifts to 0 and then just not include ms1 data with the assay. In our data, we can not make out anything in ms1 at all so that is not an issue. I would also assume though, that this is not really a great way to get unbiased scores. As it looks now, I will probably try some deviations from the mProphet approach on my own. One thing skyline does which is really problematic for me is that it will always keep the last AA fixed on decoy creation. This is not suitable for the MHC epitope peptides that I am working on, where there are no set cleavage characteristics. I would take the time later to write up another post where I could list some things that were limiting the use of skyline for our epitope peptides if you are interested. Best, Jonas

Brendan MacLean responded:	2018-12-21 06:16
Hi Joe, That last point is pretty interesting. I could see our getting better about only keeping the last AA constant if it matches your cleavage settings. That also seems like it might be useful to have a "None" value for the Peptide Settings - Digestion - Enzyme. That would somewhat limit your use of protein sequence information, but Skyline might still be able to make protein associations with a background proteome. Do you care about protein associations, or are you simply using peptide lists to get around cleavage? We are always looking for ways to improve the software and it never hurts to know more about where we could improve, even if it will take us time to get the improvements made. Thanks for your feedback. Please do clarify your thoughts on where we might improve for your use case. --Brendan

jfoe responded:	2019-01-01 08:48
Dear Brendan, We just use peptide lists and don't care about protein associations. One thing though is that we really care about individual peptides even if they have some problematic characteristics. It took me some time to reproduce of few things I encountered. A: We are using heavy reference peptides for our assays. Due to the irregularity of our peptides we would receive them with a single heavy AA at a specified location in the peptide. E.g. PEMPL[+7]TIDLME Note the presence of a heavy L as well as a light L. If I paste this into skyline I will not get any variable modifications (like M[+16]) applied. This is in contrast to pasting PEMPLTIDLME. It's clear to me why that is the case but the result is that I have to generate all variable modifications externally and paste it like so: PEMPT[+7]TIDLME PEM[+16]PL[+7]TIDLME PEMPL[+7]TIDLM[+16]E PEM[+16]PL[+7]TIDLM[+16]E A workflow for peptide import where skyline just does this would be great of course. B: Also I wanted to look into the collisional dissociation of our peptides in detail. For this I activated neutral losses for loss of water and amonia loss. When I then imported my peptides I would get countless duplicated transitions. I have attached an example skyline file for this (neutral_loss_issue). In this file there is for example 4 times: L [y8 -51.1] - 891.4822+ Importing results with this would then yield eg: At 01:34: Duplicate transition 'L - y8+' found for peak areas Of course these transitions are not really unique in a strict sense but it would be great is this was handled more gracefully. The issue was created by just pasting the string "ALNEKLVNL" into the empty target list of the example file. Also, if you open the example file, go to peptide settings, and disable the structural modification for water loss, you will trigger: Unexpected Error An item with the same key has already been added. C: We are using a Q Exactive and thermo seems to encourage the use of normalized collision energy. When I try to do collision energy optimization with normalized energies, skyline would import the .raw file but would not recognize that the various spectra are based on different collision energies. I am using Skyline 4.2.0.18305. These are some of the things that I would love to have some help with. Best wishes for a happy new year, Jonas
neutral_loss_issue.sky.zip

jfoe responded:	2019-01-24 06:50
Dear Skyline team, I am giving this a quick bump in case it slipped through over the new year. Please excuse me if that's not needed. I am considering an application for the May Institute course to learn more about whether immunopeptidomics is a field that skyline is trying to actively support and how I can apply it best. Best regards, Jonas

Brendan MacLean responded:	2019-01-27 21:42
A: Okay, but probably we won't get to this any time soon. It may be worth noting that you can be even more explicit with Skyline by using {} for labeling modifications, e.g. PEM[+16]PL{+7}TIDLME This can be especially useful when you have both a structural modification and a labeling modification on the same residue. But, yeah, when you get this complicated we are going to put more work on you to describe what you want. There just are not that many people with requirements this complicated. B: I will look at this and get back to you. Sounds like a bug, of course. C: We don't support optimization of normalized collision energy.

Brendan MacLean responded:	2019-01-27 21:56
B: Okay, I see what you mean, but you are allowing up to 5 neutral losses and "Always" allowing them regardless of seeing anything in a spectral library. Do you really think you are likely to see 5 neutral losses from a peptide precursor? And can you imagine that kind of fragment ion having any real abundance making it worth measuring? To me, this just seems absurd and likely to leave you measuring interference and not something truly related to your precursor. Certainly, you would greatly increase the probability of this with these settings. You need to allow 3 neutral losses to get the issue you describe with 52.1 showing up multiple times, because it requires the loss of 2 x NH3 and 1 x H2O. I would expect reviewers to look askance at your results if you were to report a fragment with those 3 losses as important to your results. I am pretty sure SEQUEST doesn't even look past a single neutral loss of NH3 or H2O from a fragment ion. I will still look into fixing the bug, but would also be interested in hearing a justification of why you think you need more than 2 losses of Water and Ammonia on your fragments. Thanks for reporting this issue.

jfoe responded:	2019-01-27 23:13
Dear Brendan, A: It's great that skyline supports an extra syntax for heavy labels. I think this would be an easy way to not break other workflows if you just applied variable modifications if something with just a {+x} label is pasted. B: Essentially this came from a workflow where I create spectral libraries with skyline. If you measure small batches of synthetic peptides, skyline has pretty much all the functionality to find the peaks and create a spectral library. The benefits of this to me seem integration of intensities over the peak, accurate retention times and easy manual verification. If you want to include neutral losses there though, you are pretty much stuck between looking for none or all. So I would agree that in hindsight I should probably have gone with a database search -> spec lib -> skyline and I need to put some work into setting up a database search which gives me robust results. The main reason I reported it was because of the clear error message which I thought you might be interested in. Thank you for the great feedback!

jfoe responded:	2019-01-27 23:32
I think the reason I avoided database searches when I started using skyline for spectral library generation was also related to the fact that skyline is pretty good dealing with the specific heavy labeling that we are using. This caused some issues with reliability of the database searches, making the manual verification in skyline even more advantageous. Also when doing collision energy optimization, I wouldn't want to use a spectral library, and to see whats happening to neutral losses there was also something I was interested in extracting from the data.

A:

B: