build spectral library

support
build spectral library yuderuii  2017-06-11 04:11
 
hi,
I run Bibliospec on Windows, but I don't know the way to set how much peaks in the consensus spectrum,can you help me?
 
 
Brendan MacLean responded:  2017-06-11 08:37
BiblioSpec does not create consensus spectra, but instead uses a "best" true spectrum approach, though the way we choose that best spectrum has changed since the original paper. We now take best scoring spectra where possible, and then highest total intensity among spectra that cannot be compared by score (e.g. equal score or from different scoring engines).

This was found to perform much better in PRM data where the original all-vs-all dot-product method described in the BiblioSpec paper would most often choose a spectrum with a lot of noise and low intensity, because so many more of the identified spectra were of that type, they tended to match to each other better than the true best example of a spectrum match to the peptide.

Our own investigation of the consensus spectrum algorithm used in SpectraST (which we implemented and tested in BiblioSpec) did not help solve the problem with PRM data. Originally, we thought that maybe using it to produce a consensus spectrum and then scoring the true spectra against that would help us pick the spectrum closest to the peak apex in PRM data, but it had similar problems to the original BiblioSpec best spectrum algorithm in that when there were enough noisy spectrum matches the consensus became very noisy.

So, BiblioSpec still does not produce consensus spectra, nor does it allow you to limit the number of peaks it stores. With BiblioSpec, you can still be sure that the spectra it contains in its library were actually measured.

Thanks for posting your question to the Skyline support board.

--Brendan
 
yuderuii responded:  2017-06-11 18:59
Thanks a lot! and is there a paper of the new Bibliospec?
 
Brendan MacLean responded:  2017-06-11 21:54
Unfortunately, it didn't seem groundbreaking enough for us to write a paper on just new criteria for choosing a best spectrum, but as I hear people citing papers that prove consensus spectra are better than best spectra, I sometimes wonder whether the comparison is still fair, since we no longer use the all-vs-all best spectrum algorithm compared against. Maybe we could publish a comparison in that line of research.

Our own need to change was motivated by building BiblioSpec libraries from PRM data, where we could see all-vs-all was picking clearly inferior spectra.

So, still no paper on this. Sorry.
 
yuderuii responded:  2017-06-12 06:41
Thanks a lot! Best wishes for you!
 
yuderuii responded:  2017-06-14 19:19
hi,

I have another trouble in building library with Bibliospec. Several options needed when run BlibBuild,and '-c' used for specify the cutoff score (0-1) below which peptide-spectrum matches will be excluded from the library, What is the default value of it?

Thanks for your reply!
 
Nick Shulman responded:  2017-06-15 16:37
I believe the default value of the cutoff actually depends on the type of peptide search results.

You can find the source code for BlibBuild.exe here:
https://svn.code.sf.net/p/proteowizard/code/trunk/pwiz/pwiz_tools/BiblioSpec/src/BlibBuild.cpp

In there, the score cutoffs are initialized to:
    scoreThresholds[SQT] = 0.01; // 1% FDR
    scoreThresholds[PEPXML] = 0.95; // peptide prophet probability
    scoreThresholds[IDPXML] = 0; // use all results
    scoreThresholds[MASCOT] = 0.05; // expectation value
    scoreThresholds[TANDEM] = 0.1; // expect score
    scoreThresholds[PROT_PILOT] = 0.95; // 95% confidence
    scoreThresholds[SCAFFOLD] = 0.95; // Scaffold: Peptide Probability
    scoreThresholds[MSE] = 6; // Waters MSe peptide score
    scoreThresholds[OMSSA] = 0.00001; // Max OMSSA expect score
    scoreThresholds[PROT_PROSPECT] = 0.001; // expect score
    scoreThresholds[MAXQUANT] = 0.05; // MaxQuant PEP
    scoreThresholds[MORPHEUS] = 0.01; // Morpheus PSM q-value
    scoreThresholds[MSGF] = 0.01; // MSGF+ PSM q-value
    scoreThresholds[PEAKS] = 0.05; // PEAKS p-value
    scoreThresholds[BYONIC] = 0.05; // ByOnic PEP
    scoreThresholds[PEPTIDE_SHAKER] = 0.95; // PeptideShaker PSM confidence

Later on in the file, you can see what happens to those values if an explicit cutoff is specified on the command like with the "-c" parameter:

        scoreThresholds[PEPXML] = explicitCutoff;
        scoreThresholds[PROT_PILOT] = explicitCutoff;
        scoreThresholds[SQT] = 1 - explicitCutoff;
        scoreThresholds[MASCOT] = 1 - explicitCutoff;
        scoreThresholds[TANDEM] = 1 - explicitCutoff;
        scoreThresholds[SCAFFOLD] = explicitCutoff;
        scoreThresholds[OMSSA] = 1 - explicitCutoff;
        scoreThresholds[PROT_PROSPECT] = 1 - explicitCutoff;
        scoreThresholds[MAXQUANT] = 1 - explicitCutoff;
        scoreThresholds[MORPHEUS] = 1 - explicitCutoff;
        scoreThresholds[MSGF] = 1 - explicitCutoff;
        scoreThresholds[PEAKS] = 1 - explicitCutoff;
        scoreThresholds[BYONIC] = 1 - explicitCutoff;
        scoreThresholds[PEPTIDE_SHAKER] = explicitCutoff;

So, that means that for SQT files, since the cutoff started off at .01, and if you specify a value it gets changed to "1 - explicitCutoff", for SQT files, the default explicitCutoff is .99.

I hope this makes sense and is helpful.
-- Nick
 
yuderuii responded:  2017-06-15 22:32

Thanks, It is extremely useful for me. Best wishes !

I am sorry to disturb you again, I build library from pepxml with BlibBuild, but plunge into confusion, that some peptide in the library display at different charge state with it in the input file.

Have you ever had the same situation?

Thanks a lot!
 
Kaipo Tamura responded:  2017-06-16 11:42
Hi,
We haven't encountered this before, but if you can share your pepxml at http://skyline.ms/files.url and some peptides with different charge states than you expected, I'll take a look at it.

Thanks,
Kaipo
 
Brendan MacLean responded:  2017-06-16 13:01
Do consider that the input file may contain multiple charge states and you are just missing the one that appears in the library, or seeing another one first. What I could imagine being especially confusing, say, is:

1. You have multiple pepXML files.
2. One you look in has only the peptide sequence in question for charge state 2.
3. One of the others you haven't yet considered has it for charge state 3.
4. The probability value for #2 is below the cut-off you have specified (e.g. 0.7 when you used 0.95)
5. The probability value for #3 is above the cut-off (e.g. 0.98 when you used 0.95)

This set of conditions would result in a library that contains only the sequence at charge state 3, but your eyes are telling you the search engine found it only at charge state 2 (in the one input file you have checked - but that instance didn't meet the score cut-off).

I recently spent a good bit of time digging through two pepXML files from which I built a library using a cut-off score reported in a paper to understand why certain reported peptides were not ending up in the library, and I can report that I made mistakes and got myself confused at times about what PSM had the best score and in which file that score appeared.

The problem ended up being the the cut-off reported in the paper was used with SpectraST, and it was relying on the PeptideProphet probability, while Skyline was relying on the iProphet probability, which meant the probability cut-off was not directly transferrable.

But, I wrote this more to communicate that it can be very confusing digging into the details of the input files, but resolving that confusion may also only require extra effort at understanding what the tool did for you.

Have a close look. Maybe it is a bug, but more likely it is just missing an important detail.

--Brendan
 
yuderuii responded:  2017-06-20 18:57
hi,

thanks a lot! I'll check it again, and then post it here


yuderui