Issue 654: Add support for PEFF extended FASTA format (http://www.psidev.info/peff)

issues
Status:open
Assigned To:Brian Pratt
Type:Todo
Area:Skyline
Priority:3
Milestone:19.2
Opened:2019-06-10 by Brian Pratt
Changed:2019-08-01 by Brian Pratt
Resolved:
Resolution:
Closed:
2019-06-10 Brian Pratt
Title»Add support for PEFF extended FASTA format (http://www.psidev.info/peff)
Assigned ToGuest»Brian Pratt
Type»Todo
Area»Skyline
Priority»3
Milestone»19.2
PEFF looks like normal FASTA, except for some #comment lines, and a more structured > header which we should be able to parse into gene species etc.

2019-08-01 Brian Pratt
At a minumum we should quietly ignore comment lines as in this example from https://github.com/HUPO-PSI/PEFF/blob/master/Examples/PEFF_AnnotID_Insulin_Valid.peff

# PEFF 1.0
# GeneralComment=This is an example of using the AnnotationIdentifiers=true flag with Insulin as a example
# //
# DbName=AnnotationIdentifiers Insulin valid example
# Prefix=nxp
# DbDescription=Insulin extracted from neXtProt and extended to include annotation identifiers
# Decoy=false
# DbSource=www.nextprot.org
# DbVersion=1.0
# DbDate=20180207
# NumberOfEntries=1
# HasAnnotationIdentifiers=true
# SequenceType=AA
# //
>nxp:NX_P01308-1 \PName=Insulin isoform Iso 1 \GName=INS \NcbiTaxId=9606 \TaxName=Homo sapiens \Length=110 \SV=1 \EV=228 \PE=1 \ModResPsi=(0:53|MOD:00087|N6-myristoyl-L-lysine)(1:31|MOD:00798|half cystine)(2:96|MOD:00798|half cystine)(3:43|MOD:00798|half cystine)(4:109|MOD:00798|half cystine)(5:95|MOD:00798|half cystine)(6:100|MOD:00798|half cystine) \VariantSimple=(7:2|T)(8:6|C)(9:6|G)(10:6|H)(11:8|Q)(12:9|S)(13:12|V)(14:18|R)(15:21|L)(16:22|V)(17:23|S)(18:23|T)(19:24|D)(20:24|V)(21:29|D)(22:29|P)(23:32|R)(24:32|S)(25:34|D)(26:35|P)(27:38|V)(28:42|A)(29:43|G)(30:44|R)(31:45|K)(32:46|Q)(33:47|V)(34:48|C)(35:48|S)(36:49|L)(37:51|I)(38:52|R)(39:53|E)(40:53|T)(41:55|C)(42:55|H)(43:56|W)(44:58|V)(45:63|A)(46:63|L)(47:64|W)(48:65|L)(49:68|M)(50:70|R)(51:71|V)(52:73|C)(53:75|D)(54:76|N)(55:76|R)(56:79|L)(57:81|V)(58:83|K)(59:84|R)(60:85|Y)(61:89|C)(62:89|H)(63:89|L)(64:89|P)(65:90|C)(66:90|D)(67:92|L)(68:93|K)(69:94|K)(70:96|S)(71:96|Y)(72:98|R)(73:101|C)(74:103|C)(75:106|D)(76:108|C) \Processed=(77:1|24|PEFF:0001021|signal peptide)(78:25|54|PEFF:0001020|mature protein)(79:57|87|PEFF:0001022|transit peptide)(80:90|110|PEFF:0001020|mature protein) \DisulfideBond=(81:1,2|between chains)(82:3,4|between chains)(83:5,6|A chain only) \Proteoform=(NX_P01308-1-pf1|1-110||preproinsulin)(NX_P01308-1-pf2|25-110||proinsulin)(NX_P01308-1-pf3|25-110|1,2,3,4,5,6|proinsulin with disulfide mods)(NX_P01308-1-pf4|90-110||Insulin A chain cleaved)(NX_P01308-1-pf5|90-110|3,4,5,6|Insulin A chain modified)(NX_P01308-1-pf6|25-54||Insulin B chain cleaved)(NX_P01308-1-pf7|25-54|5,6|Insulin B chain cleaved)(NX_P01308-1-pf8|25-53|0,1,3|B chain in an extracellular region)(NX_P01308-1-pf9|57-87||C peptide cleaved)(NX_P01308-1-pf10|57-87||C peptide cleaved)(NX_P01308-1-pf11|90-110,25-54|81,82,83|Insulin: chains A and B joined)
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN