Please hold off on hacking the pepXML itself. I will discuss with Kaipo when he is here.
I wouldn't expect best spectrum matching based on XCorrs not to be a lot different from pepXML probabilities. They are sorted in the same order, and I think we may use the xcorr as a tie breaker when probabilities are equal, effectively making this equivalent to using xcorr in the first place.
Can you give more detail on what you are seeing and why you think it would change for what you are requesting?
Thanks for your feedback.
--Brendan |
Hi Brendan,
If I plot the probability scores against XCorrs they correlate but not perfectly. See attached plot where the X-axis is XCorr and the y-axis is the probability score (although note that I invert the scores to 1-score so that they're correctly interpreted as peptideprophet scores where 1=good). They look like they might correlate pretty-well, considering differences in xcorr due to charge-state, etc, but I wasn't 100% sure. My instinct was given a dozen different redundant spectra from different experiments the one with the highest XCorr would be the one I'd like to use, whereas the one with the best probability score might depend somewhat on the rest of the data from that search (since decoy search statistics are dependent on true/decoy hits from that experiment). Xcorr is *absolute* whereas probability scores seem dependent on experimental parameters. You guys have probably given this a lot of thought so if I'm not getting this quite-right please advise.
The various questions I've been asking lately relate to creating spectral libraries from millions of PSMs distributed across many experiments. If I could manually chose the best reference spectrum for a given peptide this would be less critical, but since I have to go with a single big library (or a few medium-sized ones) I want to make sure that the best-spectrum is successfully getting chosen.
Thanks
Gabe
PS - Brian - changing to 'analysis_result_' had no effect. The probability score was still used. I could try deleting the whole 'analysis_result' section, but I'd like to hear Brendan's thoughts before experimenting further. |