mProphet for DIA: Remove peptides to apply features & RT difference d o debets  2019-06-19 08:02
 

Dear Skyline team,

Recently I've performed a relatively big DIA experiment for the first time and I am experiencing some problems with applying the different features for the mProphet peak picking. I know some of these questions have been asked previously, but unfortunately I haven't been able to find an answer on this forum that could solve my isseus. I'd be very grateful if you could provide me with some insights!

First, some information about my experimental set up:

  1. I have made my own library by fractionating a pooled sample and running this in DDA mode on an Orbitrap, using the iRT peptides as well. This yielded a library of around 10.800 proteins and 208.000 peptides (including decoys).
  2. My DIA experiment consists of around 60 samples that I ran, again including the iRTs.
  3. I have used the command line version of Skyline to match my DIA files to the library and am currently using Skyline 4.2.0.19107 to look at the mProphet model and my peaks.

The first thing I noticed when looking at my model, is that it's not very good at separating my targets from decoys, but this is probably partly due to the fact that quite some important features were not used. First of all, I deleted all the peptides that didn't have a library dotp. These peptides made sense since in the majority of times there were not enough transitions matched to the library. However, there were quite a few of them and I had to remove them by hand, leading me to my first question: is there a smarter/more efficient way to remove these peptides?

After removal of these peptides, I looked at the peptides that had no information for the RT difference value feature. However there are loads of peptides that do not have this information, leading me to think that maybe something has gone wrong in the way that I set up my library/Skyline file. Leading me to my second question: Do you have any idea what could cause this? Maybe more generally, could you give me an idea why a peptide would not meet this criteria? And last but not least: how can I deal with this? There are way too many to delete them by hand, so I do not really know how to proceed from here.

As I said, it's the first time I am performing a DIA experiment, so some insights would be very much appreciated. I have attached the zipped Skyline file for you to have a look at if that's helpful. I have removed the majority of the raw files from the results there since otherwise the file would be way too big. Maybe leading me to my last question: is the file size that I have somewhat as expected? The output .skyd-file is around 380GB, which sounds like a lot to me?

Thanks a lot for your help in advance!

Best wishes,
Donna

 
 
Brendan MacLean responded:  2019-06-19 08:35

Hi Donna,

  1. Is there a smarter/more efficient way to remove these peptides?

I would usually use Edit > Refine > Advanced - Min transitions per precursor for this. You just specify a minimum like 3, if you have no MS1 precursors, or 6 if you have 3 MS1 precursor transitions in every peptide. You can also set this up from the start by setting Transition Settings - Library - Minimum product ions. We would typically recommend at least 4 and my understanding is that the Aebersold lab often sets both Pick: [ ] product ions and [ ] minimum product ions to 6. Using either method to keep from getting precursors with less than 3 fragment ions will keep you document from having precursors where the library dot product cannot be calculated.

  1. Do you have any idea what could cause this? Maybe more generally, could you give me an idea why a peptide would not meet this criteria? And last but not least: how can I deal with this?

I don't know why these peptides would end up missing in your iRT library, but in a fractionated library, I suppose, if the iRT peptides could not be detected sufficiently in one of the fractions and that fraction did not otherwise contain enough overlapping peptides with any of the other fractions, that could force Skyline to abandon the effort to give its peptides iRT values and you might end up with detections in your library without corresponding iRT values. Just a guess, though. I would be happy to look further, if I could get the files.

To recover I would open the iRT Calculator editor on your current iRT library (Peptide Settings - Prediction - click the button with the calculator icon > Edit Current) Select all of the peptides in your iRT calculator (maybe even adding the iRT peptides? using a text editor) Then use Edit > Refine > Accept Peptides and past the list into this form. This should reduce your document to only peptides with iRT values.

You could also use the Document Grid, which now allows deleting selected elements. So you would build a report for peptides with a retention time prediction column, filter for everything where the prediction is #N/A, select all, and click the delete button.

  1. Is the file size that I have somewhat as expected? The output .skyd-file is around 380GB, which sounds like a lot to me?

That does sound pretty big. It would likely get smaller and the processing faster once you only have peptides with iRT values. When you use peptides without iRT values, Skyline is forced to extract full gradient chromatograms. But, 208,000 peptides with, say, 10 chromatograms each (3 precursors, 7 fragments) could end up around 2,000,000 chromatograms with 60 files. That could get pretty big. Though, 380 GB still sounds larger than I would expect. Fix your other problems first and we will see where you end up. Are you otherwise, using +/- 5 minutes around RT predictions? Or something else?

Hope this helps as a start.

--Brendan