final library match issues

final library match issues Tobi  2020-04-01

Dear Skyline Team,

please find attached files to explain issues to finalize fixing the library match window and related Prosit mirror dotp. Summed up, the dotp seems to falsely include removed y1,y2 etc. leading to reduced dotp-value while target list and document grid get it right.

With best wishes,

Brendan MacLean responded:  2020-04-01

Hi Tobi,
The dotp in that mirror plot is the dotp between the two spectra in the plot and is currently correctly showing what we expect. While the dotp in the document grid is showing the dotp between your measured peak areas and the matching library spectrum peaks.

Originally, Tobi R. implemented this as it seems you are requesting and only calculated a dotp between peaks that had signal in both spectra in the mirror plot. I didn't feel that was such a great idea, because it can give great dotp values between measured and Prosit predicted spectra where Prosit predicts very intense peaks which are entirely missing in the measured spectrum.

We could, of course, put this back as an option, but I disagree that this is the desired default behavior. I guess you might want 3 options?

  1. Full spectrum dotp (what you get today)
  2. Peaks with signal in both spectra dotp (what Tobi R. originally implemented)
  3. Peaks for only targeted transitions dotp

This #3 may actually be what you are asking for and best matches how the Library Dot Product gets calculated in the document grid. The value in the mirror plot, however, was originally designed to give us insight into how well Prosit was doing at predicting compared to measured spectra, since that would inform how your targeted transitions are chosen, and not how well ions match once your targeted transitions are already chosen, which as I have explained might easily ignore all of the ions Prosit predicts as the most intense.

That is, does it really matter that Prosit matches relative intensities well between a measured spectrum and its 10-15 most intense peaks, if it gets 1-5 entirely wrong?

I understand that in your manufactured case, this is what you want to see, but I think it is less likely to be generally useful, and it is certainly not a case where you can prove that Skyline is just getting this calculation wrong. It is the right calculation for what it intends to show, even if it disagrees with what the document grid is showing, which has a totally different intent.

Do you want us to expose new options? Do you agree with the possible options I have listed above?

Thanks for your feedback.


Tobi responded:  2020-04-02

Dear Brendan,

thank you for the fast and detailed reply, I can follow your ideas except for: did you design it with SRM in mind?

I was just thinking, if you want to investigate Prosit performance, you have to differentiate cases were measured spectra peaks are correct and Prosit is wrong from cases where Prosit is right but the spectrum measurement flawed (need high-quality or perfect reference measurements). For low intensity fragments there might not be a single best solution at all, but including e.g. y1 in the does not make sense as the error probability is just too high except when measuring purified standards. In complex proteomes, we find these low ordinal fragments not to be very helpful for both ident. and quant. and there are excluded in any case.

I believe many libraries, especially those built with Skyline, exclude some fragments (measurement range, low ordinal fragments, DIA precursor window exclusion, manual or automatic curation, TopX intensity filtering). All those excluded fragments will lead to a mismatch and reduced dotp in the mirror plot, as they are treated as zero on one side instead of being excluded on the other as well. While it can be independent on target list and intensities, I find it very important to have the same filter rules applied to both spectra in the mirror plot, this makes it fully optional for the user which fragments are to be incorporated and which might be hidden.

For now, if you expect a different dotp value compared to document grid I would at least name it differently. Having the same name "dotp" but 2 different values in one document is highly confusing. While it is a nice graph, you would also need to manually cover the dotp value every time you make a figure from it as it is generally low and differs from your actual data.

I would take the library spectrum, apply fragment filter rules, and then make the prosit mirror for all leftover fragments (including when measured intensity is 0 in case libraries contain that info). It would give you a clean and fully flexible, customizable view, would make nice figures, and help with all kinds of quick prediction tests. Do you see issues with this variant? In case libraries do not differentiate excluded from 0-intensity fragments an alternative would be to take the Prosit mirror first, apply filter rules, and then add spectral intensities for leftover fragments, that would give you your desired info on selecting transitions. Do you think having these 2 options available makes sense?

With best wishes,