Creating spectral libraries from Proteome Discoverer result files

johannes voshol

2022-03-25 04:53

Hi
I am trying to create spectral libraries from PD results, using Skyline's 'Import Peptide Search' functionality. When I import from the .pdResult file, the scores are not recognized and all set to 0. When I use the corresponding .msf file, the scores are imported, even though - obviously- they are exactly the same and have the same column name (PercolatorqValue). Is that intended behavior or is there something wrong?
A second question is about the SpecIDinFile parameter. This ends up being a strange looking float like -1602.53037. In all cases the number before the decimal point (here 1602) corresponds to the WorkflowID from PD, but the rest has no obvious relation to file ID or scan number as one would maybe expect. Do you have any info on how this parameter is calculated and/or if and how you could retrieve the actual scan number from it?
Thanks a lot for providing these great tools to the community!
Hans Voshol

Kaipo Tamura responded:	2022-03-29 11:37
Hi Hans, We recently changed the behavior on scores from PD, so depending on which version of Skyline/BiblioSpec you are using it may differ. Most recently, we are filtering based on the TargetPeptideGroups.Confidence column if you use a score threshold of 0.01 (corresponding to confidence 3) or 0.05 (confidence 2). In these cases, scores will be unavailable like you noticed. Otherwise, we check for various columns: (in order of preference) Qvalityqvalue, PercolatorqValue, qValue, ExpectationValue. The SpecIDinFile is based on the workflow ID from PD like you mentioned, and also the spectrum ID. I'm not sure how PD assigns spectrum IDs, but you can find them in the MSnSpectrumInfo.SpectrumID or MassSpectrumItems.ID columns. Thanks, Kaipo

johannes voshol responded:	2023-03-24 13:08
Hi Kaipo, hi Skyline Team Thanks again for your feedback. Sorry took me a while to put a couple of screenshots together to illustrate the problem, that still exists in my hands, in more detail. I am now using Skyline v 22.2.0.312, which I guess is the latest version. In principle it would not seem the most logical to start from the TargetPeptideGroups table from PD to create spectral libraries. As the name says this table consolidates all the PSMs for a certain peptide. In contrast, the 'TargetPsms' table contains all the info on the individual spectra. And in fact I think that is what is happening in the background because the non-redundant libraries do contain all the spectra. The issue that is left is the scoring of the spectra and linked to that the selection of the 'best' spectra for the library. Even though the import window of Skyline says it will use the Percolator q-value (see my screenshots) in fact it does not. In my library all scores are set to 0 even though the q-values are present (see screenshots). As a consequence, Skyline randomly picks a spectrum among multiple for a certain peptide instead of the best one. See details in my screenshots. But simply using the q-value would solve all that. Best wishes Hans
Import Peptide Search.pdf

Nick Shulman responded:	2023-03-24 14:21
The "Percolator q-value" that you see in the "Score Type" column in the Import Peptide Search wizard does not actually tell you where BiblioSpec is going to be reading the scores from in the .pdresult file. All that "Percolator q-value" is supposed to mean is that the scores are going to behave like Percolator q-values in terms of whether lower scores are better than higher scores, and what value corresponds to an X% FDR. (I am not sure what the difference is supposed to be between "Percolator q-value" and a different generic "q-value" that you might see instead for other types of search results). Most of the scores that BiblioSpec might read from a .pdresult file behave like percolator q-values. However, if it happens to be that BiblioSpec decides to use the scores from the "ExpectationValue" column in the "TargetPsms" table then that score type would say "Mascot expectation". I am not the expert on how BiblioSpec reads .pdresult file, but here is where the code is that checks for the presence of different columns and decides which one to use for the score: https://github.com/ProteoWizard/pwiz/blob/master/pwiz_tools/BiblioSpec/src/MSFReader.cpp#L483 If you send us your files we might be able to give you more information about what BiblioSpec is doing. If your files are less than 50MB you can attach them to this support request. If they are larger than that, you can zip them up and upload them here: https://skyline.ms/files.url You should probably include your .pdresult and .msf files. -- Nick

johannes voshol responded:	2023-03-25 13:56
Thanks so much for the quick response Nick. I will dig through the code, see if my understanding of Python is sufficient to figure out what is going on (I did realize it is not written in Python :-). Happy to provide some files, but maybe I’d rather generate something a bit smaller, this specific pdresult was in the order of 20 G. Best Hans