Bruker tdf 2.0 format and memory consumption

support
Bruker tdf 2.0 format and memory consumption tobias.kockmann  2018-04-17 03:04
 
Dear skyline support,

I am experimenting with the new Bruker tdf 2.0 file format that is now supported by the latest skyline-daily release and realised that the memory consumption is pretty high. I am running skyline-daily on a Windows Server 2012 R2 std incl. 32 GB RAM and skyline seems to occupy >36 GB when importing a few targets (123 transitions, see screenshot). Is that possible and why? I am only importing MS1-level data acquired in PASEF mode (one TIMSMS heat map every 1.1 s). So only around 15% of the available scans data should be coming from the MS1 level. The tdf_bin is around 3.8 GB on disc. That would correspond to something like 0.5 GB of MS1 scan data on disc.

Greetings,
Tobi
 
 
Brian Pratt responded:  2018-04-17 08:00
Hi Tobi,

I would be curious to know what the memory consumption is like when running that file through msconvert.exe - that would help us understand if the problem is with Skyline, or more on the reader library side.

Ideally, if you can provide the file (and the Skyline document .sky.zip via Skyline File>Share>Complete) we can experiment with it here.

Thanks,

Brian Pratt
 
tobias.kockmann responded:  2018-04-17 08:07
Hi Brian,

ok - i could test that. Which conversion should I use: tdf 2.0 to mgf? Where should I place the files?

Greetings,
Tobi
 
tobias.kockmann responded:  2018-04-17 08:12
My current MSconvert version does not recognise the .tdf files. Is there a new version?
 
Brian Pratt responded:  2018-04-17 08:49
I would just do a conversion to .mzml - format doesn't really matter, we're just curious about memory usage during the conversion relative to what happens in Skyline.

It's likely that your msconvert needs updating, yes. There's actually a new version of msconvert pretty much every day: msconvert is released on a "continuous integration" basis. So, for example, as soon as Matt gets the tdf 3.0 compatible reader integrated, and all the automated tests pass, it will automagically appear in the msconvert installer download. (Skyline is a lot more conservative in its release schedule, but the mix of the two approaches is quite useful.)

You can get the files to me by dropping them at http://skyline.ms/files.url

Thanks,

Brian
 
Brendan MacLean responded:  2018-04-17 09:16
Careful, though. Conversion to full profile mzML could consume a massive amount of disk space and writing to disk could really slow the whole thing down.

Are you importing using centroided extraction? Highly recommended for Bruker.
 
tobias.kockmann responded:  2018-04-19 07:05
Hi Brendan, Hi Brian,

a)

I just tried to convert tdf 3.0 data using MSconvert and got the same as with skyline:

0\SAHA 2 10min_Slot1-8_1_1859.d\SAHA 2 10min_Slot1-8_1_1859.d\analysis.tdf
------------------------------
Starting...
Opening file "D:\skyline-daily\p1000\SAHA 2 10min_Slot1-8_1_1859.d\SAHA 2 10min_Slot1-8_1_1859.d\analysis.tdf" for read...
Failed - System.Exception: TDF schema version 3.0 not supported.
   at pwiz.CLI.msdata.ReaderList.read(String filename, MSDataList results)
   at MSConvertGUI.MainLogic.processFile(String filename, Config config, ReaderList readers)
   at MSConvertGUI.MainLogic.Go(Config config)

MSconvert Version: 3.0.18109-390173050

b)

I switched to centroided data import in skyline-daily for the tdf 2.0 data. The memory footprint went down to 20 GB :-)

Greetings,
Tobi
 
tobias.kockmann responded:  2018-04-19 07:44
appendix:

I tried to convert tdf 2.0 data to mgf. The process was pretty time consuming and in the end I got an mgf file of 0 kb.

:-(
 
Brian Pratt responded:  2018-04-19 09:07
Hi Tobi,


I'll try to get this sorted out. Can I get the files (tdf 3 and tdf 2) mentioned in this thread? You can upload to http://skyline.ms/files.url or whatever is convenient.

Thanks,

Brian
 
tobias.kockmann responded:  2018-04-20 04:49
Hi Brian,

I uploaded the tdf 2.0 and 3.0 example files. The tdf 3.o contains also an example how Bruker now generates mgfs.

Hope that helps. btw: The conversion of the tdf 2.0 to mgf finished, but the file size is 122 GB (as Brendan predicted). That really makes you wonder what the hell people did when designing this PSI format.

Greetings,
Tobi
 
Brendan MacLean responded:  2018-04-20 12:42
I assume you mean tdf 2.0 to mzML, since PSI was not involved in the MGF definition. It should be considerably smaller, if you use centroid mode (vendor peak picking). Converting to MFG for something like a Mascot search, I would expect you to filter for MS/MS spectra only and centroided, which should be even smaller, given that only the MS1 spectra include the IMS dimension.

Thanks for providing us with example files.

--Brendan
 
tobias.kockmann responded:  2018-04-23 07:22
An update:

@Brendan: Your assumption is correct. tdf 2.0 to mzML. Sorry for the confusion. I now started a new conversion attempt using the filter: peak picking Vendor, MS levels 1-. The process has been running for hours now and the progress bar is at about 50%.

I also tried to convert to mgf using the vendor peak picking and only MS level 2-2. The results is again a file of 0 KB.

Greetings,
Tobi
 
tobias.kockmann responded:  2018-04-24 04:00
An update:

The process finished and created again an mzML file of 122 GB (this time vendor picking was used). Am I doing something wrong here?
 
Brian Pratt responded:  2018-04-24 08:27
>> Am I doing something wrong here?

I don't think so. As you and others have noted, the mzML format is quite verbose. (That's why it's great that Skyline can read raw data files directly).

We're looking into the memory usage problem, which we (meaning MattC) have been able to reproduce. No word yet on whether the problem is on the proteowizard side, or within the updated vendor DLL itself.

The empty mgf output is interesting - what settings did you use? And do we have the input data? (BTW it's easier to communicate and test around these issues using the commandline msconvert.exe rather than MsConvertGUI)

Thanks

Bruan
 
Brian Pratt responded:  2018-04-24 15:01
Hi Tobi,

I've asked around and it sounds like we have not yet adapted our code to properly associate MS1 precursor data with MS2 scans in PASEF, which means no MGF output. We're working on it, as well as on understanding the memory problem.

Thanks

Brian
 
tobias.kockmann responded:  2018-04-25 01:07
Hi Brian,
Hi Brendan,

ok - thanks for the update. We got some tdf 3.0 files from the Bruker demo lab in Bremen and need to have a look at the results for a grand application. The deadline is 15. May. Do you expect that skyline will be ready to import tdf 3.0 files before that date? Will skyline be able to make use of the mgf files that we got from Bruker (you recieved a copy with the tdf 3.0 data)?

Greetings,
Tobi
 
Brian Pratt responded:  2018-04-25 06:56
It's not clear that we can completely solve the memory issue by then (it may be out of our control) but we should be able to handle it otherwise, yes.

Looking at that latest MGF, I'm not seeing any ion mobility information at all. Possibly I missing something?

On the other hand, the "SAHA 2 10min_Slot1-8_1_1859_5.1.201.xml" file seems to contain everything you'd want to generate a spectral library with 1/K0 information. But I have no idea if that's an artifact that would be generally available. What are your thoughts?

I'm away until Monday, hopefully by then Matt has the precursors working and we can stitch this all together one way or another.

Thanks,

Brian
 
tobias.kockmann responded:  2018-04-25 23:21
Hi Brian,

this the first time I got complete tdf data folders incl. these new mgf types. The folder is created by the Bruker acquisition software (Histar?) if I am not mistaken. I am not really sure how mature this mgf format is, or which peace of code exactly created it. But I could ask the demo lab people.

Greetings,
Tobi
 
Brendan MacLean responded:  2018-04-28 10:24
Hi Tobi,
Thanks for explaining. On a call we had with Bruker earlier they described a Python script they used to decorate the MGF files with IMS information before running Mascot. It seems likely that the MGF files you have provided us have not been through the Python script. Our goal is to get rid of the need for the Python script and make it possible to get usable MGF files from ProteoWizard MSConvert, but we probably need more examples of working MGF files pre-Mascot search to do that.

I have pointed our technical contacts at Bruker to this thread. I hope they will soon be able to help us get good MGF examples of what a Bruker PASEF MGF should look like (with IMS values) before running it through a Mascot search.

Thanks for your continued help in getting these new features to meet your and other Bruker PASEF users' needs.

--Brendan