Issue 645: MaxQuant mod parser will fail when first two letters are shared between 2 different mods

issues
Status:open
Assigned To:matt.chambers42@gmail.com
Type:Defect
Area:Skyline
Priority:3
Milestone: 
Opened:2019-04-25 by matt.chambers42
Changed:2019-06-21 by matt.chambers42
Resolved:
Resolution:
Closed:
2019-04-25 matt.chambers42
Title»MaxQuant mod parser will fail when first two letters are shared between 2 different mods
Assigned ToGuest»matt.chambers42@gmail.com
Notify»Brendan MacLean;Nick Shulman
Type»Defect
Area»Skyline
Priority»3
Milestone»4.3
For example:
EFISQLCLQEKIR    13    1    Trimethyl (K),Trioxidation (C)    _EFISQLC(tr)LQEK(tr)IR_   

MaxQuantReader will match both mods to Trimethyl because it comes first in the list of mod names. In this case we could fix it by adding AA specificity to the lookup, but if two mods share the first 2 letters AND specificity (Sulfation and Sulfo share Y, Cysteinyl and "Cysteinyl - carbamidomethylation" share C, etc.), then there's no hope for this way of matching mods to position. The only way I can see to do it properly is to look at the extra probabilities columns for each non-terminal mod. For example:
The "Trimethyl (K) probabilities" column has:
EFISQLCLQEK(1)IR

And the "Trioxidation (C) probabilties" column has:
EFISQLC(1)LQEKIR

What a mess!

2019-06-21 matt.chambers42
Milestone4.3»