MSP Spectral Library import for small molecules failed

support
MSP Spectral Library import for small molecules failed stolltho  2020-05-17 22:43
 
Hi there.

Wanted to import .msp spectral libraries from:
https://mona.fiehnlab.ucdavis.edu/downloads
https://minedatabase.mcs.anl.gov/#/download
http://prime.psc.riken.jp/compms/msdial/main.html#MSP

.msp import works for HMDB, MoNA and ReSpect but NOT for GNPS, Fiehn HILIC and LipidBlast (throws an error message, see example attached) from https://mona.fiehnlab.ucdavis.edu/downloads

Spectral library import from
https://minedatabase.mcs.anl.gov/#/download
http://prime.psc.riken.jp/compms/msdial/main.html#MSP
does not work at all, i.e. shows zero molecules after import

win10
skyline-daily 20.1.1.134

cheers,
Thomas
 
 
Brian Pratt responded:  2020-05-18 08:37
Hi Thomas,

Thanks for the report. I had already found the MoNa/LipidBlast problem with the formula being strangely formatted when the precursor_type is "[M+]", that should be fixed in the next Skyline-Daily.

MSP is a very loose "standard" in the way that FASTA is a "standard", everybody seems to find a different way to express things within the confines of the format. There is still work to do on the MoNA/LipidBlast import to pick out all the useful information (SMILES, InChiKey etc) that's crammed into the "Comments" field, but what I have ready to go does get the name and strangely formatted formula in so that's a good step forward.

I will investigate the others as well. Please do not hesitate to alert us to any other problems you find.

Thanks for using the Skyline support board!

Brian Pratt
 
stolltho responded:  2020-05-18 16:21
Thanks Brian for your swift reply.
I actually don't need the libraries at the moment.
Just wanted to let you guys know, since others might encounter the same issue.
Cheers,
Thomas
 
Emmanuel responded:  2021-05-04 13:53
Dear Brian,

I'm still facing the same issue than Thomas, reason why I'm re-opening this discussion ....

I've tried to import .msp databases from the MS-DIAL/Riken website and for some of the libraries it works (e.g. FiehnHILIC), but not for other databases (e.g. Massbank, Lipidblast ).

When comparing the field names of one library that is imported and another that isn't, I don't see any difference. I'm using the latest Skyline-daily release (64-bits , 21.0.9.118) on a Win10 Pro (v. 1709) computer.

Also, when importing a library that works I've got sometimes a warning message about duplicates that should be excluded before the library import. Apparently Skyline is checking the unicity of the INCHIKey, but for some entries one can get the same INCHIKey but different adducts, compound names, instrument type or collision energies ... (see attached screenshot).

Do you mind sharing which fields are checked for dups during the import of .msp libraries?
Knowing these fields will help to curate the libraries before their import into Skyline.

Thanks for your support.

Emmanuel
 
Brian Pratt responded:  2021-05-05 09:11
Hi Emmanuel,

The challenge with .msp files is that there's no real standard for how things are represented. You provided a screenshot of a nice tidy table, but that's not what an MSP file looks like. In a .msp file those values could be wedged into a "Comments" field, or appear with any variety of keywords - it's all down to whoever wrote the .msp export code for whatever system you're working with. If you come across a variant that we haven't yet dealt with, we'd appreciate seeing example of it.

As to duplicates, it's really a question of what we do *not* track rather than what we do. If we come up with two or more entries for the same molecule and adduct, that's a conflict. The spectra might come from different machine types, or with different collision energies, or a host of other parameters, but as we don't track those it's up to the user to filter on the appropriate values before passing the information to Skyline. You wouldn't want to feed Skyline a spectral library that wasn't appropriate to the kind of raw data being processed, and it's up to you to select the contents carefully. The BiblioSpec schema is a good example of the things we do track in a spectral library: https://raw.githubusercontent.com/ProteoWizard/pwiz/master/pwiz_tools/BiblioSpec/tests/reference/tables.check

I hope this explains things.

Thanks for using the Skyline support board!

Brian Pratt
 
Emmanuel responded:  2021-05-07 05:32
Hi Brian,

Thanks for your reply.

The nice tidy table I have attached is not an .msp file, but the result of parsing steps I'm doing to check different things before importing the real .msp file into Skyline ... no variant I came across yet, I'm using the usual MS-DIAL/RIKEN and MoNA websites to download these .msp libraries.

I agree that curation remains the best solution, but considering the number of entries in each library it's quite tedious and time-consuming. Therefore, I'm looking for a global screening approach with large libraries to process DIA datasets in order to find putative hits that can be further (manually) confirmed.

Thanks for pointing me towards the BiblioSpec format, I'll have a closer look at it.

Have a nice WE.

Emmanuel
 
Brian Pratt responded:  2021-05-07 09:22
If you've been parsing .msp files yourself then you'll appreciate that the contents can vary wildly depending on who produced it (retention time, for example, has been variously observed as two different keywords as well as being buried in a "comment"). Even within the MONA website there are a great many msp variants to be found. It's not a trivial problem to read these files, they're only marginally better than free-text.

So again, if you encounter a .msp file that Skyline doesn't handle well, we would love to see that file.

Many sources of .msp files provide their own means of filtering an internal database before emitting a .msp file, that's ideal for curation. But they're still likely to emit some, shall we say, innovative representations of the data, and we'll always want to hear about it when Skyline doesn't handle them well.

Best,

Brian
 
Emmanuel responded:  2021-05-20 03:09
Hi Brian,

Indeed it's not easy to parse the.msp files ....

Today I've bumped into another issue related with the .msp library (Massbank) that you might be able to solve.

When importing the original Massbank library (not parsed or modified by me - see attached file) using the "Spectral library explorer", I got the usual message informing about the removed duplicates with the list of their INCHI keys, but I could go through and get the library in the explorer.

However, when adding all the compounds to the target list I had the attached error message. I tried to add only few cpds (one at a time) and it went well.

I tried to understand the error message, and even searched some of the text described in the message directly in the original .msp library, but I could not find any match that could help me understand what's going on (or which record is the troublemaker).

Thanks in advance for your help if you have time to look into this.

Attached files have been uploaded separately to the file share folder.

Best,

Emmanuel
 
Brian Pratt responded:  2021-05-20 08:23
Yes, that's not a very helpful message.

If you can also provide the Skyline file, using File > Share > Complete, I should be able to identify the problem (and a workaround) quickly.

Thanks,

Brian
 
Emmanuel responded:  2021-05-20 15:11
Brian,

I have uploaded the Skyline zipped file on the separate share folder.

Make sure to use the "Massbank_original" library if you want to reproduce the error when importing all the compounds.

Thanks in advance for your help.

Best,

Emmanuel
 
Brian Pratt responded:  2021-05-20 16:24
Obviously Skyline needs to do a better job of helping the user with this, but the issue is that Methyl-13-hydroperoxy-delta9Z,11E-octadecadienoate claims to have a peak at m/z 0.3304. This seems deeply unlikely. If you edit that out of the .msp file you can probably proceed.

I will work on improving Skyline's handling of this kind of dodgy input.

Thanks,

Brian
 
Emmanuel responded:  2021-05-21 10:02
Brian,

Thanks for checking this error in the Massbank library. Apparently there are few more records with m/z values of "0.xyz" in their spectra....

Would that mean one needs to correct all these entries manually (again and again when there are library new releases) or could Skyline skip importing fragments with such low m/z values that make no sense?

Enjoy your long week-end.

Best,

Emmanuel
 
Brian Pratt responded:  2021-05-21 10:15
That's just a workaround for your immediate use, the next Skyline-Daily will quietly ignore those. I've already got a pull request in that fixes the problem.