There are peptides with probability than the threshold in the blib library: /home/support

There are peptides with probability than the threshold in the blib library

support

View Request

There are peptides with probability than the threshold in the blib library Fengchao 2021-12-30 14:08

It is initialized by a FragPipe user. To save some typing time, please read the threads here (https://github.com/Nesvilab/FragPipe/issues/570#issuecomment-1003194356).

Thanks,

Fengchao

Nick Shulman responded:	2021-12-30 15:41
Can you give us an example of a peptide that is not supposed to appear in the .blib file? I am not sure I understand the Excel spreadsheet that you linked to (which I am also attaching here). Are you saying that the lines where there's a "FALSE" in the "is_in_FragPipe_lib" represent peptides that are included in the .blib file but should not be? The spreadsheet seems to be saying that the peptide "DNECEEESGGIR" from "interact-CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00004_NAT_rank1.pep.xml" is not supposed to be included. When I look in the .pep.xml file for that peptide it says: <search_hit peptide="DNECEEESGGIR" ... <search_score name="expect" value="5.275938e-04"/> That is a really good score, which is why that peptide ends up being included. Is there something else in the pep.xml file that BiblioSpec should be looking at instead in order to decide that that is a bad peptide? -- Nick
Book2.xlsx

Fengchao responded:	2021-12-30 16:22
Hi Nick, Thanks for your prompt reply. I set 0.727174 as the cutoff threshold in Skyline, so all peptides with probability lower than that value should be filtered out (am I understand it correctly?). In the Book2.xlsx, after sorting the `probability_from_FragPipe` column, you will see a lot of peptides that are supposed to be filtered out. Taking `GPPGPMGPPGLAGPPGESGR` as an example, the entry in pep.xml is <spectrum_query start_scan="6556" assumed_charge="3" spectrum="CPTAC_CCRCC_W_JHU_20190112_LUMOS_C3L-00004_NAT.06556.06556.0" spectrumNativeID="6556" end_scan="6556" index="6341" precursor_neutral_mass="1783.8853" retention_time_sec="951.561812064"> <search_result> <search_hit peptide="GPPGPMGPPGLAGPPGESGR" massdiff="0.017822265625" calc_neutral_pep_mass="1783.8674" peptide_next_aa="E" num_missed_cleavages="0" num_tol_term="2" protein_descr="Collagen alpha-1(I) chain OS=Homo sapiens OX=9606 GN=COL1A1 PE=1 SV=5" num_tot_proteins="1" tot_num_ions="38" hit_rank="1" num_matched_ions="9" protein="sp\|P02452\|CO1A1_HUMAN" peptide_prev_aa="R" is_rejected="0"> <search_score name="hyperscore" value="20.06153"/> <search_score name="nextscore" value="15.480862"/> <search_score name="expect" value="6.335863e-03"/> <analysis_result analysis="peptideprophet"> <peptideprophet_result probability="0.616103" all_ntt_prob="(0.616103,0.616103,0.616103)"> <search_score_summary> <parameter name="fval" value="-0.159561"/> <parameter name="ntt" value="2"/> <parameter name="nmc" value="0"/> <parameter name="massd" value="9.990802"/> <parameter name="isomassd" value="0"/> </search_score_summary> </peptideprophet_result> </analysis_result> </search_hit> </search_result> </spectrum_query> The probability 0.616103 is lower than the threahold. Best, Fengchao

Nick Shulman responded:	2021-12-30 16:53
When BlibBuild.exe is reading these search results, the only score that it is looking at are the things that say: <search_score name="expect" value="#####"/> It sounds like you are thinking that BlibBuild should be looking at the "peptide_prophet_result" element instead. BlibBuild will only use the "peptide_prophet_result" values if it finds a "peptideprophet_summary" element towards the beginning of the file. There are no "peptideprophet_summary" elements anywhere in that pep.xml file. I am not the expert on BiblioSpec, so I cannot tell you whether this behavior is what it it supposed to be, but I can tell you why BiblioSpec is making the decisions that it is making. Do you think BiblioSpec should be doing something different? -- Nick

Fengchao responded:	2021-12-30 17:03
Hi Nick, Thanks for your reply. It replies everything. Could you please tell me what tags BiblioSpec looks for from the `peptideprophet_summary`? We can implement it in FragPipe. Also, if using `expect`, where can I set the threshold in Skyline? Thanks, Fengchao

Nick Shulman responded:	2021-12-30 17:42
Here's where the code is in BiblioSpec where, if it finds a "<peptideprophet_summary>" it decides it's going to use the peptide_prophet_result values: https://github.com/ProteoWizard/pwiz/blob/master/pwiz_tools/BiblioSpec/src/PepXMLreader.cpp#L122 If you just put a "<peptideprophet_summary />" tag (probably has to be inside of the "<analysis_summary> tag") I think it would work. You can look in the .blib file and see which score type was used. ".blib" files are SQLite databases. There is a table "ScoreTypes" which lists the mapping between integer and score types. Typically, "2" means "peptide prophet" and "6" means "X Tandem". If you look at the "scoreType" column for any of the rows in the "RefSpectra" table you can see which score type was used. The probability cutoff score that you specify in Skyline ends up being applied to whatever score values BiblioSpec ends up reading from the peptide spectrum matches. So, if you wanted that <search_score name="expect" value="5.275938e-04"/> to be excluded, I think you could set the "cut-off score" in the Build Library dialog to something like "0.9995". -- Nick

Fengchao responded:	2021-12-30 19:33
Hi Nick, Thank you very much for your information. After adding `peptideprophet_summary`, Skyline works as expected now. Best, Fengchao

MacCoss Lab Software

MacCoss Lab Software

There are peptides with probability than the threshold in the blib library

View Request