HMDB-msp file import problem

HMDB-msp file import problem Joerg  2020-10-14

we tried to import the HMDB for small molecule quantification into Skyline. We downloaded the msp-file from Oliver Fiehn´s website. Skyline import part of the database, but just "forgets" the majority of the data. Is there any way to change the msp-file in a way that Skyline recognizes all entries? Or is import of the "original" xml-file from the HMDB website possible?
Many thanks in advance!

Brian Pratt responded:  2020-10-14

Hi Jörg,

Can you be more specific about what you mean by "forgets"?


Brian Pratt

Brendan MacLean responded:  2020-10-14

Please provide screenshots of what you are seeing and explain what you hope to see instead. Thanks.

Joerg responded:  2020-10-15

here are some screenshots from the import. The msp-file should contain 7,415 spectra. Skyline imports 915 molecules, 620 can be added (most likely due to redundancies in the HMDB?), 511 of these are non-empty (maybe only Full-MS data or GC-MS-data without precursor information in the HMDB?). My problem is not the few molecules without data, the main drop from 7,415 spectra to 915 individual molecules. Particularly, there is "phosphorylcholine" in the HMDB with several spectra, but does not turn up in the 915 improted molecules, for instance.
Many thanks

Brian Pratt responded:  2020-10-15

Hi Joerg,

I'm not seeing this on the Fiehn lab website. Perhaps you meant MoNA? The spectrum counts agree, so I'll assume that's the case:

There are a couple of issues with this file, which I would be very pleased to work through with you.

The first issue is that many spectra (6500, actually) provide no hint as to precursor adduct e.g. 1-Methylhistidine at line 520.

When Skyline encounters incomplete entries like this, it just skips over them. I'll add warning messages about that, but it would be better if we could actually use these entries instead of discarding them. Is there an assumption we can safely make instead of dropping these? If we can assume z=1 from "Ion_mode: P", should we then assume protonation (m/z = mass + H), or inherently charged (m/z = mass)?

The other issue in this file is that there are ambiguous entries, as Skyline warns:
Imported library contains multiple entries for one or more molecule+adduct pairs.

This is probably due to the library containing entries for multiple parameters such as instrument type or collision energy, but Skyline keys only on molecule+adduct so these entries are ambiguous.

You should filter the library as needed before importing to Skyline

The important point there is "You should filter the library as needed". For example, Glycerol has four entries, three of which are "Instrument_type: Quattro_QQQ" with various CE values, and the fourth is "Instrument_type: GC-MS". For better or worse, Skyline's spectral libraries don't track instrument type or CE so you wind up with four library entries that are identical except for their fragments. That's probably not what you want.

We originally dealt only with NIST .msp, and NIST does come with convenient filtering tools. But if we're going to accept these kind of random database dumps then we should think about how Skyline could help with the filtering at import time. I'd be happy to hear any ideas you might have on that.



Joerg responded:  2020-10-16

Hi Brian,
many thanks! Our main problem seems to be the missing "precursor adduct" then. I would typically report the "[M+H]+", but I don´t think we can assume this for all database contributors. That said, I don´t see another opportunity as there are no adducts stated in the HMDB-msp-file (not even in the original at the MoNA). However, the GC-MS-spectra are imported as "M+" while no statement is made there either. I don´t know where Skyline gets this information from. Sorry that I can not be of help here...

Brian Pratt responded:  2020-10-19
>> GC-MS-spectra are imported as "M+"

In the case of GC we're pretty safe assuming electron ionization, but with ESI all bets are off.

We should probably pop up a dialog asking various questions as required by the import file:

This file contains xxx positive-mode spectra with no declared precursor m/z value. Please select an adduct to use in positive mode m/z calculations: [list of positive adducts]
This file contains yyy negative-mode spectra with no declared precursor m/z value. Please select an adduct to use in negative mode m/z calculations: [list of negative adducts]
This file contains spectra for more than one instrument type. Please select an instrument type for import: [list of instruments found in file]
This file contains spectra for multiple tuning parameters on the the same instrument type. Please select a handling method: [Discard/Combine/Use highest spectral count]

What do you think?


Joerg responded:  2020-10-20
Hi Brian,
sounds like a good idea. Then the user can make the assumptions and judge wich assumptions to take (and which possible flaws could be introduced by these settings). We will have to live with the possibility of false positives anyway, this is not much better with our database search results in our hands, there also 50% are not reliable.
Many thanks