Confusion with Skyline and Proteome Discoverer

support
Confusion with Skyline and Proteome Discoverer Mark Athanason  2020-10-20 07:05
 

Hi all,

First off thanks for providing the support that you do, hope I don't bother you all too much.

For right now I'm fine using the proteome discover software, but ultimately I want to perform DIA quant on similar data using skyline. Within PD, I perform a search using sequest and peptide validation with percolator with an FDR of 1%. Using the "import DDA search" wizard in the most recent version of skyline daily, I use the .msf file with a cut off score of 0.99. What I get in skyline doesn't seem to match what I see in PD at all. For example PD says I have 411 protein groups, but in skyline when I check remove repeat and duplicate peptides, I'm left with 222. Shouldn't these be the same since percolator is filtering for high confidence peptides? additionally, for some peptides PD does not correlate an MS1 trace where skyline does ... in my opinion incorrectly.

I'm uploading the .msf file, "Ibu1_DDA_1.raw" "Ibu1_DDA_2.raw" and "Ibu_DDA_3.raw" to https://skyline.ms/files.url

Any help would be greatly appreciated!
Mark

 
 
Brendan MacLean responded:  2020-10-20 08:40

I think you are misunderstanding "remove duplicate peptides". This will remove all peptides that appear in more than one protein. Whereas, protein grouping will preserve all peptides and simply form groups of proteins that explain those peptides. If you use only "remove repeated peptides" I would expect you to end up with the same number of unique peptides, since this causes Skyline only to avoid duplicating peptides, but to leave the first protein associated with any peptide. This is not the same as protein grouping, however. So, it doesn't mean that you will end up with the same number of proteins in Skyline as you see protein groups in proteome discoverer.

Skyline does not yet support a direct transfer from the world of protein groups to comparable peptide lists associated with multiple peptides. It is just going to be a challenge to check your expectations based on the protein groups you see in PD. Instead, you should look at unique peptides and expect that "remove repeated peptides" will give you the same numbers as PD.

If you want to create "protein groups" in Skyline, then you will need to add peptides (and not proteins or FASTA) with your own customized group names. Skyline will consider these peptide lists, but other researchers have represented multi-protein peptide groups in this way.

Hope this helps clarify. Skyline started its life as a highly targeted tool, where the idea of a collection of peptides which could represent any composition of a set of proteins was not something researchers were doing in a targeted way. As Skyline has become more powerful, we have seen more and more researchers wanting what you are describing, which is simply to take DDA results and apply them repeatedly and reproducibly to DIA data.

Thanks for your feedback.

--Brendan

 
Mark Athanason responded:  2020-10-21 08:16

Hi Brendan

Thanks so much for your clear and thoughtful answer. That makes since, as I've been doing more research on how these search engines choose the "most likely" protein associated with peptides and it seems a bit ambiguous and skyline isn't responsible for this. For my application, I am highly focused on PTMs so I will follow your advice to keep all peptides.

In the future I plan to do more global proteomic analysis. Right now I'm going by Sequest's default minimum peptide length of 6 amino acids. In your opinion, do you think increasing this length to say 8 aa's would help reduce this ambiguity?

Thanks again,
Mark

 
Brendan MacLean responded:  2020-10-21 10:13

Hi Mark,
Definitely, very few people would advise going lower than 6 AAs. Whether you go to 7 or up to 8 which is the default minimum in Skyline is up to you. The fact that 8 is the default minimum probably says something about what the MacCoss lab does, but I don't have an opinion other than that the longer peptides probably end up being more useful in quantifying your proteins, both due to uniqueness and also reduced probability of interference with high Q1 and Q3 m/z values.

Good luck with your research. Thanks for using Skyline.

--Brendan