parse rules MS Amanda kguehrs  2021-01-05 02:25
 

Hi Skyline team,

First of all: Happy New year with good health and ongoing success in all things of your personal and professional life.

I have tried to generate a spectral libraries from DDA runs with MS Amanda integrated in the newest Skyline version. In my first approach, I have used a limited but refined (adapted to UniProt headers) database. MS Amanda performed well and created a library as expected. In my second approach, I used the same DDA files but a larger but not refined in-house generated database. MS Amanda did all the searches and created the mzID files but failed to create the library due the failure to parse the database entries. The database was generated in our institute and has a header format that is obviously not recognized by MS Amanda in its default state of integration into Skyline. Unfortunately, the standalone version of MS Amanda is not very handy for generating results of multiple searches and I actually have no idea how to change parse rules in MS Amanda.

I there any documentation about the parse rules MS Amanda applies and which type of database headers are supported. This would be helpful to adapt the databases if necessary. The other way would be to have a possibility to adapt/change the parse rules when using MS Amanda in Skyline to generate libraries.

Best Karl-Heinz

 
 
Matt Chambers responded:  2021-01-13 10:58

Hi Karl-Heinz,

Thanks for the report. Can you give an example of the FASTA header that is causing a problem? I wasn't aware there were any parsing restrictions in MSAmanda. I don't think it should be restricted to any particular header format.

 
kguehrs responded:  2021-01-13 23:46

Hello Matt,

please, find below some samples of headers of the used fasta-file. I do not know if all of the headers cause problems as the notification of MS Amanda was not completely clear for me about this. I however used this fasta-file with other search engines (MaxQuant, ProteomeDiscoverer) without causing any trouble. I will insert the top 5 and the final 5 entries from the fasta below.

Best Karl-Heinz

Nfu_p_1_009268 geneid=Nfu_g_1_003772 transcriptid=Nfu_t_1_009268 proteinid=Nfu_p_1_009268 gene=CABZ01087670.1 annotation=DEAQ box RNA-dependent ATPase 1
MSRVLRGGETEELQVGRPSKGLMKTGLKTRSEMLEFVAPILYQTLSAVLSVSDHHSLTTGDAQCTQFLKN
FLGKIASIRTQISSPYVSSFPEPILHRSLDIFDLLSLSSLENLXVCVCVCVYIYIYVCVCVCVCVCVCGC
FLVRCCSMEDVVLYQTAWSTGARTEDTHETSSVSQTKGASVRQPEAA
Nfu_p_1_009269 geneid=Nfu_g_1_003773 transcriptid=Nfu_t_1_009269 proteinid=Nfu_p_1_009269 gene=CU459095.1 annotation=Uncharacterized protein
MSWVKKLPSNMTESSGQTPDPTVESLPSAISTAFTQHEHSIHTLLEHQASTNQRLLQLDAMLRELNDKLL
QLPQFLHHHLNHFRFSLVRQHSGTSGSQTRLLLIPSPLPFSTTALLDSGCEKNLLDQSVVTRLRIPTTPL
LTPVQVLSLDGNALTTITHQTVPVSGNHQEVISFFVFPSPQFPVILGHEWLITHNPHIDWRTGQVAVWSP
YCLSHCLLSANLPVQVPATSPLPISDLSSVPLKYHDLQTVFSKDQAASLPPHRPYDCCINLLPGAVS*
Nfu_p_1_009267 geneid=Nfu_g_1_003771 transcriptid=Nfu_t_1_009267 proteinid=Nfu_p_1_009267 gene=CLSTN2 (2 OF 2) annotation=calsyntenin 2
MESPGVGTSSPGITELPKYLEHNLKTFNTRFIHSDAIRTGRVNKHKPWIETSYHGVITENTNIVLLDPPL
VALDKDAPIPYAGEGEICAFNIHGLEAPFEAVVLNGTSGEGQLRARGLVDCELQKEYTFIIQAHDCGSGP
GGAEGKKSHKAVVHIQVNDVNEFAPVFRESQYRAAVTEGKIYDSILQVEATDQDCSPQYSQICNYQITTT
NTPFAIDRNGNIRNTEKLSYDRQQEYEIQVTTWDCGQKRALHSVPVHINVKPVCKPGWQGWSKRVDYEPG
TGSKQLFPTMQLETCGEPLSLVRTTVELQTSHIGKGCDRETYSEKSLQKLCGASSGSTDLLPAPSAATNW
TASLVTDSGRDTDLIFRFDGHQAAKIPDWVVPQNLTDHFTIATWMKHGPSPGLRAEKETLLCNSDKTGTI
ITYVAGVSEKLRRIFSKHNIPENLNPTTPSDRTWSIQKTKHPNRSSVASFMQSSIVKTLQSVMFEGNFLT
HDMKNHHQELEGREVRVWDNAWLSRETKEVQVGR*
Nfu_p_1_009266 geneid=Nfu_g_1_003770 transcriptid=Nfu_t_1_009266 proteinid=Nfu_p_1_009266 gene=GPR20 annotation=G protein-coupled receptor 20
MMNLVYALIYGSIIILGLPLNVVSLWILLRNYGCTSPIAVFMVNLVISDLLLIISLSMRVYYYVKGAWLL
GSMACICFTMLFRNNIRTSSIFITFISVDRLLAVVYPLRSRHLRTTSKAVRGVVIVWLVVLMVNVPESVG
YFRDINETNCDEFETPKTLLNSKNSTLKYDKLKMATGYFQLVLLLTLLVVNIVSTVMVSWTLNRRLNESA
KVNNKMNVMLIFAMNLMMFIVFFLPVSLIVIFDHLRPLLSCLASVNCCVDPLLYYFSFDGFWKKKEDGEV
SLARHESGNTKHREVLNNRWLHG*
Nfu_p_1_058006 geneid=Nfu_g_1_025009 transcriptid=Nfu_t_1_058006 proteinid=Nfu_p_1_058006 gene=OLA.19164 annotation=Uncharacterized protein
MRSHEPDPAVEEPSTSRGSEHSSGHNLWRHLDMEVEESRMTSNTTANSIIEVQRYLAERNAPRTQDPLQY
WKNNQNLYPHLYQLALQFLCTPSSSVPCERVFSKAGELVSKRRNRLGANTLHKLLFLNKNA*

Nfu_p_1_013389 geneid=Nfu_g_1_005535 transcriptid=Nfu_t_1_013389 proteinid=Nfu_p_1_013389 gene=TAB2 annotation=TGF-beta activated kinase 1/MAP3K7 binding protein 2
MAQGNHQIDVQVLHYLCQKFPEVPEGVVSQCVLQNNNNLDACCEHLSQVSPGYHHSEEGNLSFSEDLGSP
RLRNHMTQLNLGFQSQNVHVAPVQDNLRMNGSRTLAHSMSDGPLQTGQAPNSDFFQHEPQSAPVQVPSTH
NVFGVMEPTQKPQPPQHLGLYPLAVKGSAMGPQHTPRFNPITVTLAPNPQTGRNTPTSLHIHGGPQSGLS
SPQGNSIYIRPYVSQSSTTRQSQQQGGRAQYSPTSQPQQQFYQISQLSHVYMPISSPTNPQVPCIPSNAG
PAFSSGASSCCPSSSSSSVMPTSLSTISQYNIQNISTGPRKNQIEIKLESPQRNNSTTAVLRTNSGPRSS
SAASACPSSSSSSTSVATVPTTSLSIGGPRCSQPTVFISAGSPTAASAPCEEAAVVSAGSRSQPKFYISA
NSSNDDGGARNPPTVYISANPPLQGPSGARNMSMGPAYIHHHPPKSRALAGGANAASSPRVVVTQPNTKY
TFKITVSPNKPPAVSPGVVSPTFEPNNLLSLPSDHHFVEPEPLHLSDPLSPHRDRPSEPRRLSMGSDDAA
YTQALLVHQKARMERLWHELEMKKKKLEKLKEEVNEMENDLTRRRLERLNSASHIPSVTSLGFVLLLMLA
SLHTR*
Nfu_p_1_013385 geneid=Nfu_g_1_005535 transcriptid=Nfu_t_1_013385 proteinid=Nfu_p_1_013385 gene=TAB2 annotation=TGF-beta activated kinase 1/MAP3K7 binding protein 2
MAQGNHQIDVQVLHYLCQKFPEVPEGVVSQCVLQNNNNLDACCEHLSQVSPGYHHSEEGNLSFSEDLGSP
RLRNHMTQLNLGFQSQNVHVAPVQDNLRMNGSRTLAHSMSDGPLQTGQAPNSDFFQHEPQSAPVQVPSTH
NVFGVMEPTQKPQPPQHLGLYPLAVKGSAMGPQHTPRFNPITVTLAPNPQTGRNTPTSLHIHGGPQSGLS
SPQGNSIYIRPYVSQSSTTRQSQQQGGRAQYSPTSQPQQQFYQISQLSHVYMPISSPTNPQVPCIPSNAG
PAFSSGASSCCPSSSSSSVMPTSLSTISQYNIQNISTGPRKNQIEIKLESPQRNNSTTAVLRTNSGPRSS
SAASACPSSSSSSTSVATVPTTSLSIGGPRCSQPTVFISAGSPTAASAPCEEAAVVSAGSRSQPKFYISA
NSSNDDGGARNPPTVYISANPPLQGPSGARNMSMGPAYIHHHPPKSRALAGGANAASSPRVVVTQPNTKY
TFKITVSPNKPPAVSPGVVSPTFEPNNLLSLPSDHHFVEPEPLHLSDPLSPHRDRPSEPRRLSMGSDDAA
YTQALLVHQKARMERLWHELEMKKKKLEKLKEEVNEMENDLTRRRLERLNSASHIPSIDEMKHLRSKNRA
LQIDIDCLSKEIDLLQTNGPHLNPSVIHNFYDNLGFLGPVPPKPKDSSSKAVKPSADQEEDEGVQWSCTA
CTFLNHPALNRCEQCEFPRHI*
Nfu_p_1_013388 geneid=Nfu_g_1_005535 transcriptid=Nfu_t_1_013388 proteinid=Nfu_p_1_013388 gene=TAB2 annotation=TGF-beta activated kinase 1/MAP3K7 binding protein 2
MAQGNHQIDVQVLHYLCQKFPEVPEGVVSQCVLQNNNNLDACCEHLSQVSPGYHHSEEGNLSFSEDLGSP
RLRNHMTQLNLGFQSQNVHVAPVQDNLRMNGSRTLAHSMSDGPLQTGQAPNSDFFQHEPQSAPVQVPSTH
NVFGVMEPTQKPQPPQHLGLYPLAVKGSAMGPQHTPRFNPITVTLAPNPQTGRNTPTSLHIHGGPQSGLS
SPQGNSIYIRPYVSQSSTTRQSQQQGGRAQYSPTSQPQQQFYQISQLSHVYMPISSPTNPQVPCIPSNAG
PAFSSGASSCCPSSSSSSVMPTSLSTISQYNIQNISTGPRKNQIEIKLESPQRNNSTTAVLRTNSGPRSS
SAASACPSSSSSSTSVATVPTTSLSIGGPRCSQPTVFISAGSPTAASAPCEEAAVVSAGSRSQPKFYISA
NSSNDDGGARNPPTVYISANPPLQGPSGARNMSMGPAYIHHHPPKSRALAGGANAASSPRVVVTQPNTKY
TFKITVSPNKPPAVSPGVVSPTFEPNNLLSLPSDHHFVEPEPLHLSDPLSPHRDRPSEPRRLSMGSDDAA
YTQGEPF*
Nfu_p_1_013387 geneid=Nfu_g_1_005535 transcriptid=Nfu_t_1_013387 proteinid=Nfu_p_1_013387 gene=TAB2 annotation=TGF-beta activated kinase 1/MAP3K7 binding protein 2
MAQGNHQIDVQVLHYLCQKFPEVPEGVVSQCVLQNNNNLDACCEHLSQVSPGYHHSEEGNLSFSEDLGSP
RLRNHMTQLNLGFQSQNVHVAPVQDNLRMNGSRTLAHSMSDGPLQTGQAPNSDFFQHEPQSAPVQVPSTH
NVFGVMEPTQKPQPPQHLGLYPLAVKGSAMGPQHTPRFNPITVTLAPNPQTGRNTPTSLHIHGGPQSGLS
SPQGNSIYIRPYVSQSSTTRQSQQQGGRAQYSPTSQPQQQFYQISQLSHVYMPISSPTNPQVPCIPSNAG
PAFSSGASSCCPSSSSSSVMPTSLSTISQYNIQNISTGPRKNQIEIKLESPQRNNSTTAVLRTNSGPRSS
SAASACPSSSSSSTSVATVPTTSLSIGGPRCSQPTVFISAGSPTAASAPCEEAAVVSAGSRSQPKFYISA
NSSNDDGGARNPPTVYISANPPLQGPSGARNMSMGPAYIHHHPPKSRALAGGANAASSPRVVVTQPNTKY
TFKITVSPNKPPAVSPGVVSPTFEPNNLLSLPSDHHFVEPEPLHLSDPLSPHRDRPSEPRRLSMGSDDAA
YTQGEPF*
Nfu_p_1_013328 geneid=Nfu_g_1_005515 transcriptid=Nfu_t_1_013328 proteinid=Nfu_p_1_013328 gene=PTCHD4 annotation=patched domain containing 4
MCFIGGDGASASRILWRMLRQVIHRGLKASFYWLGLFVSRHPVFFLTVPAVLTIIFGSTVLSRFKPERDL
EVLVAPTHSLAKIERSLANSLFPIDQSKHKLYSDLHTPGRYGRLILLAKSGGNILELVDQVLEVHKQVLD
LRVNYKGFNFTFAHLCVLSHRDKRCLLDDIISIFEDIRQAVLSNSSFHKVPLSYPNTTLKNGRVSFIGHQ
LGGVSFSPNSRDQQVKFARAVQITYYLRHHGPVVQDTIAERWENEFCALVNRLSTAEAPHATDKLHIQSL
TSFSLWRDFHQTGILGKGEVLVSLVLVLLAATISSSMRDCLRGKPFLGLLGVLTICIANVTAAGIFFISD
GKFNSTLLGIPFFAMGHGTKGVFELLAGWRRTRENLPFKERVADAFADVMVCYTMTSSLYIITFGMGASP
FTNIESVKIFCQSMCVAILVNYFYVFSFYGSCLVFAGQLEQNRYHSVFCCKIPSVEYLDRQPTWFKTMMS
DGHDLSTHHDSVPYQNHFIQHFLREHYTEWITNTYVKPFVVILYLIYASFSFMGCLQISDGSNIVNLLAS
NSPSVSYAFTQQKYFSNYSPVIGFYIYEPIEYWNATVQEHLKTLSHGFNKISWMDNFFHYLRVVNVSAST
KSDFIGILKSSFLRSPEYQHFTEDIIFSKNRETDEYDIIASRMYLVARTTEKKREEVVELLEKLRPLMLI
NSIKFIAFNPTFVFMDRYSSSVISPILTSGFSVLTILILTFFLVINPLGNFWLILTVTSVELGVLGLMTL
WNVGMDSISILCLIYTLNFAMDHCAPHLYTFVLATEHTRTQCIKLALEEHGAAILQNASCFVIGIMPLVF
VPSNLTYTLFKCSLLTAGCTVLHCFVILPVFLTFFPPSKKRHKKKKRAKRKEREREREREREREEIECIE
VRENPDHVTNI*

 
Matt Chambers responded:  2021-01-14 08:25

Test
ISLABKEYREMOVINGGREATERTHANSIGNS*

 
Matt Chambers responded:  2021-01-14 09:12

OK the issue is the asterisks at the end. Many engines do allow for those (trim them off implicitly), but some do not, as MSAmanda apparently does not. Easy to fix though, both in the MSAmanda code and as a workaround for your FASTA before a new Skyline-daily has it fixed: you can do a replace-all to remove asterisks at the end of a line (or just replace all asterisks regardless of where they are will probably work ok too).

 
Matt Chambers responded:  2021-02-12 08:17

This should be fixed in the current Skyline-daily.