specify score type used for blib library from pepXML

support
specify score type used for blib library from pepXML gabe  2017-09-26 10:42
 
Hi,
I'm creating .blib libraries from pep.xml files. It's using the peptideProphet probability scores to choose the best spectra, but I would prefer (I think) for it to use XCorrs, which are also in the pepXML files:

   <search_result search_id="1">
    <search_hit hit_rank="1" peptide="SVCHVPGLKK" peptide_prev_aa="R" peptide_next_aa="M" protein="sp|P19474|RO52_HUMAN" num_tot_proteins="1" num_matched_ions="13" calc_neutral_pep_mass="1390.8117335332" massdiff="-0.001656" num_missed_cleavages="1" is_rejected="0">
     <modification_info>
      <mod_aminoacid_mass position="3" mass="427.225184479"/>
     </modification_info>
     <search_score name="peptide_xcorr" value="2.7737"/>
     <search_score name="peptide_deltacorr" value="0.5253"/>
     <search_score name="peptide_diff_seq_deltacorr" value="0.5253"/>
     <search_score name="peptide_initial_score" value="251.5"/>
     <search_score name="peptide_initial_rank" value="1"/>
     <search_score name="expect" value="0"/>
     <analysis_result analysis="peptideprophet">
      <peptideprophet_result lda_probability="0.0022943213649662" probability="0.99770567863503"/>
     </analysis_result>
    </search_hit>
   </search_result>


Is there any way to force BiblioSpec to use XCorrs instead of PeptideProphet probabilities? (Score type 12 rather than 2)

Thanks

Gabe
 
 
Brian Pratt responded:  2017-09-26 11:20
You could try editing the file (using sed or somesuch) to change "analysis_result" to "analysis_result_" - that probably will do the trick.

- Brian
 
gabe responded:  2017-09-26 11:23
Sorry, are you suggesting that if it can't find a tag called 'analysis_result' it will look for the peptide_xcorr score, instead?
 
Brian Pratt responded:  2017-09-26 11:28
It would remove the ambiguity, from BiblioSpec's perspective. Not suggesting it as a long term fix, but worth trying first. I don't think there's any way to control that from the command line.
 
Brendan MacLean responded:  2017-09-26 11:30
Please hold off on hacking the pepXML itself. I will discuss with Kaipo when he is here.

I wouldn't expect best spectrum matching based on XCorrs not to be a lot different from pepXML probabilities. They are sorted in the same order, and I think we may use the xcorr as a tie breaker when probabilities are equal, effectively making this equivalent to using xcorr in the first place.

Can you give more detail on what you are seeing and why you think it would change for what you are requesting?

Thanks for your feedback.

--Brendan
 
gabe responded:  2017-09-26 11:47
Hi Brendan,
If I plot the probability scores against XCorrs they correlate but not perfectly. See attached plot where the X-axis is XCorr and the y-axis is the probability score (although note that I invert the scores to 1-score so that they're correctly interpreted as peptideprophet scores where 1=good). They look like they might correlate pretty-well, considering differences in xcorr due to charge-state, etc, but I wasn't 100% sure. My instinct was given a dozen different redundant spectra from different experiments the one with the highest XCorr would be the one I'd like to use, whereas the one with the best probability score might depend somewhat on the rest of the data from that search (since decoy search statistics are dependent on true/decoy hits from that experiment). Xcorr is *absolute* whereas probability scores seem dependent on experimental parameters. You guys have probably given this a lot of thought so if I'm not getting this quite-right please advise.

The various questions I've been asking lately relate to creating spectral libraries from millions of PSMs distributed across many experiments. If I could manually chose the best reference spectrum for a given peptide this would be less critical, but since I have to go with a single big library (or a few medium-sized ones) I want to make sure that the best-spectrum is successfully getting chosen.

Thanks

Gabe

PS - Brian - changing to 'analysis_result_' had no effect. The probability score was still used. I could try deleting the whole 'analysis_result' section, but I'd like to hear Brendan's thoughts before experimenting further.