Decoys and the Targets perfectly overlap with each other -- Import DIA Peptide Search

support
Decoys and the Targets perfectly overlap with each other -- Import DIA Peptide Search norelle wildburger  2022-01-29 11:49
 
Hi All,

I am looking at plasma data from DIA study Forchelet et al., 2018 done on a SCIEX 5600. The authors did not create their own library, but rather used a publicly available SWATH assay library Liu et al., 2015; PXD001064 2018 also generated from plasma.

The Liu et al., library lacks my protein of interest, which means it was never identified in the Forchelet dataset. As a result, I searched and found a dataset on ProteomeXchange Hondius et al., 2021; PXD023199 that also used a QTOF 5600. While these samples were from human brain, the data contained my protein of interest and was an instrument match.

I pulled the DDA data and repeated their database search steps in MetaMorpheus as described in the paper. Using the “Import DIA Peptide Search” wizard, I created a spectral library with the Hondius search results and spectral files. At the add chromatograms step, I uploaded the Forchelet .wiff files.

Please see PPT attached or full workflow details.

The issue I am having is that the Decoys and the Targets perfectly overlap with each other. So, I cannot tell what is real and what is not. It is not the ideal situation, but I have to work with what I have since the library in the Forchelet study had none of my proteins of interest so they could never be found even if there were there.

I could always try this approach by Searle et al., 2020: Gerating high quality libraries for DIA MS with empirically corrected peptide predictions;t but the issue would be the same as I am not generating DDA data with the same gradient, instrument (in GPF), or time span.

Advice much appreciated. Norelle

 
 
Nick Shulman responded:  2022-01-29 14:13
If you send us your Skyline document, we might be able to see whether there is something going on which is causing your trained peak scoring model to not be doing a good job.

In Skyline you can use the menu item:
File > Share
to create a .zip file containing your Skyline document and supporting files including extracted chromatograms.

If that .zip file is less than 50MB you can attach it to this support request.

You can upload larger files here:
https://skyline.ms/files.url

--Nick
 
norelle wildburger responded:  2022-01-30 01:32
Hi Nick,

The file is 1.65 GB so I uploaded it to the https://skyline.ms/files.url

the .zip file is called: DIA test of Hondius Brain data on 5600 with my search.sky

Cheers,
Norelle
 
Nick Shulman responded:  2022-01-30 09:02
Thank you for sending your Skyline document.

One thing that I would recommend is that you go to:
Settings > Transition Settings > Full Scan
and change "Retention Time Filtering" to "Include all matching scans".

After you do that, you can tell Skyline to extract chromatograms again by going to:
Edit > Manage Results > Reimport

The problem with restricting the length of chromatograms when you are training a peak picking model is that there are fewer incorrect candidate peaks for Skyline to look at: all of the peaks that Skyline can consider are by construction within 5 minutes of the predicted retention time. For this reason, the "Retention time difference" score never ends up getting the weighting it deserves.

I am not sure whether doing this will have a significant effect on how well the peak scoring model is able to separate targets from decoys, but it might help things.
If you send us your document again after you have extracted full-length chromatograms I might be able to give you more advice.

Hope this helps,
-- Nick
 
norelle wildburger responded:  2022-01-31 03:23
Hi Nick,

I keep getting an error when I try to re-upload the file.
Is there another way to send it?

Kind regards,
Norelle
 
Nick Shulman responded:  2022-01-31 06:49
I will send you an email directly so you can send us your files using something like Google Drive or DropBox.
-- Nick
 
Nick Shulman responded:  2022-01-31 13:05
Norelle,

Thank you for sending me your new Skyline document.
If you go to "View > Detections > Histogram", you can see how many precursors were confidently detected in how many replicates.
In your original document, there were only 27 precursors (out of about 20,000) which were confidently detected in all ten replicates.
In the new document that you sent me, where you had changed "Retention Time Filtering" to "Include all matching scans", if you then go to:
Refine > Reintegrate > Edit Current
and train the model again and apply it, then the number of precursors detected in all ten replicates increases to 71.

One thing that I did not realize was that when you change the "Retention time filtering" setting to "Include all matching scans", Skyline no longer thinks that the MS/MS ID times can be used for the "retention time difference" score in the peak scoring model.
A different change that you could have done would be to change retention time filtering to "include scans 90 minutes around MS/MS IDs". This will still result in you having full length chromatograms, but the retention time difference score will still be available for peak scoring. When I did that, I saw that the number of precursors detected in all replicates increased to 78.

I hope this helps.
-- Nick
 
norelle wildburger responded:  2022-01-31 23:44
Hi Nick,

I implemented both, suggestions but the one that made a change in the model was:
change retention time filtering to "include scans 90 minutes around MS/MS IDs"

However I am not seeing these 78 precursors and nothing is at 1% FDR
I also think I might need to use/make another spectral library. Any suggestions for trying to interograte your protein of interest in another's dataset?

pic attached.

-Norelle
 
Nick Shulman responded:  2022-02-01 07:15
Norelle,

In your screenshot, if you were to push the "Train Model" button, then Skyline would decide on a different set of weightings for the feature scores, and the bar graph would move to be centered around zero.
I am not sure what dataset led to the weightings that you have there now, but they did not come from the dataset that you are looking at now. Whenever Skyline trains a model and chooses a set of weightings, the bar graph is guaranteed to be centered on zero, because that is part of how the weightings are chosen.

If you still have more questions about this dataset feel free to send me a current copy of your .sky.zip file.

Is there a particular protein in this dataset that you are interested in?
-- Nick