What does "Identified Count" mProphet score mean

support
What does "Identified Count" mProphet score mean jmeyer  2016-07-04 15:16
 
When running the mProphet model, in some cases, I have a very large contribution from the "Identified Count" score. What does this mean? Does it have to do with the ID lines and the integration borders? It seems to be only a value of 0 or 1 based on exported mprophet features. The peak picking pdf says "Number of identified peptides within the peak boundaries," which is not clear to me or Birgit.

Thanks,
Jesse
 
 
Brendan MacLean responded:  2016-07-04 17:56
Hi Jesse,
First, I should note that use of the Skyline mProphet implementation and mProphet itself have not been all that well tested on data with identifications (mostly DDA data). It seems that "Identified Count" is what Skyline uses internally to differentiate between peaks that contain an ID or aligned ID from those that do not, which is the primary score in the default scoring for MS1 filtering of DDA data. It gets a coefficient large enough that no other score can override it.

Clearly, the description is a little off, if it is always 0 or 1.

Anyway, I can't personally vouch for the mProphet models produced for MS1 filtering of DDA data. The mProphet approach itself has only publications validating its use on SRM and DIA data, from which we might guess that it should also work on PRM data, all approaches that yield MS/MS chromatogram peaks.

You could be the first, of course, to prove it works but you will probably need to justify its validity if you plan on using an mProphet model for DDA MS1 in a publication.

Hope this helps. Thanks for posting to the support page.

--Brendan
 
jmeyer responded:  2016-07-05 11:44
Hi Brendan,

Thanks for the quick reply on a holiday.

This is actually extracted signal from SWATH data based on IDs from DIA-umpire signal extraction. It is interesting to note that the score is absent in the similar file using IDs from DDA to extract SWATH signal, probably because the score is not present for all (or maybe any) of those IDs.

Can you comment on the use of this score for extracting signal from a SWATH run based on identifications from DIA-umpire signal extraction from the same set of SWATH runs?

We might just decide to disable this score unless we can convice ourselves that this is a fair way to segregate correct peak picking. It is interesting to compare the composite models (attached) with and without the "Identified count" score enabled.
 
Brendan MacLean responded:  2016-07-05 12:08
Hi Jesse,
Ah, yes. I forgot about that case. This score only applies when the IDs come from the same runs you are extracting chromagorams from, which explains why you would never see this score for DIA data analyzed with a library from DDA runs. The library and chromatogram runs will never coincide in that case.

But, yes, if you used DIA-Umpire then you could have IDs in your DIA runs, and similar to DDA data processing, you might consider the IDs the most important score that validates your peak selection. It is unclear that you actually need to subject the data to the mProphet model in the same way you do when you are attempting to make peak identifications based on a library built from DDA data, which is what most of the mProphet-based literature is based on.

Though, as you have shown in your models. The mProphet models without extra weighting of DIA-Umpire IDs do quite well anyway, since retention time prediction and relative ion abundances will mostly be based on the DIA chromatogram peaks, and these will take over the scoring weight lost from removing IDs.

I think you could easily argue either way: 1. Peak selection with DIA-Umpire in Skyline is very similar to the state of the art for DDA, 2. Peak selection with DIA-Umpire results and mProphet without IDs is closest to what has been proven to work for mProphet with DIA.

Hope that helps. Thanks for taking the time to share your thoughts on the Skyline support board.

--Brendan