There are peptides with probability than the threshold in the blib library

support
There are peptides with probability than the threshold in the blib library fcyu  2021-12-30
 

It is initialized by a FragPipe user. To save some typing time, please read the threads here (https://github.com/Nesvilab/FragPipe/issues/570#issuecomment-1003194356).

Thanks,

Fengchao

 
 
Nick Shulman responded:  2021-12-30
Can you give us an example of a peptide that is not supposed to appear in the .blib file?

I am not sure I understand the Excel spreadsheet that you linked to (which I am also attaching here). Are you saying that the lines where there's a "FALSE" in the "is_in_FragPipe_lib" represent peptides that are included in the .blib file but should not be?

The spreadsheet seems to be saying that the peptide "DNECEEESGGIR" from "interact-CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00004_NAT_rank1.pep.xml" is not supposed to be included.
When I look in the .pep.xml file for that peptide it says:
<search_hit peptide="DNECEEESGGIR"
...
<search_score name="expect" value="5.275938e-04"/>

That is a really good score, which is why that peptide ends up being included.

Is there something else in the pep.xml file that BiblioSpec should be looking at instead in order to decide that that is a bad peptide?
-- Nick
 
fcyu responded:  2021-12-30

Hi Nick,

Thanks for your prompt reply.

I set 0.727174 as the cutoff threshold in Skyline, so all peptides with probability lower than that value should be filtered out (am I understand it correctly?). In the Book2.xlsx, after sorting the probability_from_FragPipe column, you will see a lot of peptides that are supposed to be filtered out. Taking GPPGPMGPPGLAGPPGESGR as an example, the entry in pep.xml is

<spectrum_query start_scan="6556" assumed_charge="3" spectrum="CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00004_NAT.06556.06556.0" spectrumNativeID="6556" end_scan="6556" index="6341" precursor_neutral_mass="1783.8853" retention_time_sec="951.561812064">
<search_result>
<search_hit peptide="GPPGPMGPPGLAGPPGESGR" massdiff="0.017822265625" calc_neutral_pep_mass="1783.8674" peptide_next_aa="E" num_missed_cleavages="0" num_tol_term="2" protein_descr="Collagen alpha-1(I) chain OS=Homo sapiens OX=9606 GN=COL1A1 PE=1 SV=5" num_tot_proteins="1" tot_num_ions="38" hit_rank="1" num_matched_ions="9" protein="sp|P02452|CO1A1_HUMAN" peptide_prev_aa="R" is_rejected="0">
<search_score name="hyperscore" value="20.06153"/>
<search_score name="nextscore" value="15.480862"/>
<search_score name="expect" value="6.335863e-03"/>
<analysis_result analysis="peptideprophet">
<peptideprophet_result probability="0.616103" all_ntt_prob="(0.616103,0.616103,0.616103)">
<search_score_summary>
<parameter name="fval" value="-0.159561"/>
<parameter name="ntt" value="2"/>
<parameter name="nmc" value="0"/>
<parameter name="massd" value="9.990802"/>
<parameter name="isomassd" value="0"/>
</search_score_summary>
</peptideprophet_result>
</analysis_result>
</search_hit>
</search_result>
</spectrum_query>

The probability 0.616103 is lower than the threahold.

Best,

Fengchao

 
Nick Shulman responded:  2021-12-30
When BlibBuild.exe is reading these search results, the only score that it is looking at are the things that say:
<search_score name="expect" value="#####"/>

It sounds like you are thinking that BlibBuild should be looking at the "peptide_prophet_result" element instead.
BlibBuild will only use the "peptide_prophet_result" values if it finds a "peptideprophet_summary" element towards the beginning of the file. There are no "peptideprophet_summary" elements anywhere in that pep.xml file.

I am not the expert on BiblioSpec, so I cannot tell you whether this behavior is what it it supposed to be, but I can tell you why BiblioSpec is making the decisions that it is making.

Do you think BiblioSpec should be doing something different?
-- Nick
 
fcyu responded:  2021-12-30

Hi Nick,

Thanks for your reply. It replies everything.

Could you please tell me what tags BiblioSpec looks for from the peptideprophet_summary? We can implement it in FragPipe.

Also, if using expect, where can I set the threshold in Skyline?

Thanks,

Fengchao

 
Nick Shulman responded:  2021-12-30
Here's where the code is in BiblioSpec where, if it finds a "<peptideprophet_summary>" it decides it's going to use the peptide_prophet_result values:
https://github.com/ProteoWizard/pwiz/blob/master/pwiz_tools/BiblioSpec/src/PepXMLreader.cpp#L122
If you just put a "<peptideprophet_summary />" tag (probably has to be inside of the "<analysis_summary> tag") I think it would work.

You can look in the .blib file and see which score type was used. ".blib" files are SQLite databases. There is a table "ScoreTypes" which lists the mapping between integer and score types. Typically, "2" means "peptide prophet" and "6" means "X Tandem". If you look at the "scoreType" column for any of the rows in the "RefSpectra" table you can see which score type was used.

The probability cutoff score that you specify in Skyline ends up being applied to whatever score values BiblioSpec ends up reading from the peptide spectrum matches. So, if you wanted that <search_score name="expect" value="5.275938e-04"/> to be excluded, I think you could set the "cut-off score" in the Build Library dialog to something like "0.9995".
-- Nick
 
fcyu responded:  2021-12-30

Hi Nick,

Thank you very much for your information. After adding peptideprophet_summary, Skyline works as expected now.

Best,

Fengchao