Problem importing Brucker Tims-TOF Pro data in docker container

support
Problem importing Brucker Tims-TOF Pro data in docker container m j noga  2021-04-23 02:42
 

Dear Skyline Support team,

With a larger team of bioinformaticians and developers we are prototyping data processing and analysis workflow based on Skyline. In one of the use cases we performing targeted MS1 feature extraction using from the TimsTOF Pro data. We already successfully tested the workflow on Windows, both via GUI and command line and we are right now at the stage of transferring it to containerized environment on our Linux cluster.

As a first stage we are just using official docker image (https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses), but we experiencing a high number of import failures with the following exception:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Collections.Generic.GenericArraySortHelper`2.Swap(TKey[] keys, TValue[] values, Int32 i, Int32 j)
   at System.Collections.Generic.GenericArraySortHelper`2.PickPivotAndPartition(TKey[] keys, TValue[] values, Int32 lo, Int32 hi)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.Sort(TKey[] keys, TValue[] values, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items)
   at pwiz.Skyline.Util.ArrayUtil.Sort[TItem](TItem[] array, TItem[][] secondaryArrays) in Z:\pwiz\pwiz_tools\Skyline\Util\Util.cs:line 945
   at pwiz.Skyline.Model.Results.SpectraChromDataProvider.Spectra.SortSpectrum(SpectrumInfo spectrumInfo, Int32 i) in Z:\pwiz\pwiz_tools\Skyline\Model\Results\SpectraChromDataProvider.cs:line 997
   at pwiz.Common.SystemUtil.ProducerConsumerWorker`2.Consume(Object threadIndex) in Z:\pwiz\pwiz_tools\Shared\Common\SystemUtil\ProducerConsumerWorker.cs:line 185
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart(Object obj) 

When attempting to import a single file multiple times it fails 30-40% of the cases, at random stage during import (see example import log attached).
We invoke Skyline by:

docker run -it --rm -v /Users/m/surfdrive2/Glycopeptides/data:/data \
proteowizard/pwiz-skyline-i-agree-to-the-vendor-licenses wine SkylineCmd --dir=/data \
--in=output/clean.sky --out=output/with_data.sky --timestamp --memstamp \
--import-file=Controle_16_Slot1-36_1_3174.d --full-scan-precursor-res=40 \
--log-file=output/import_log.txt > data/output/import_console_log.txt 

Setting up options like '--import-process-count' or '--import-threads' doesn't seem to have any influence.

We already tested this using docker on Linux and OS X hosts with similar error rates.

We do not see this issue when running the same workflow on Agilent Q-ToF data, which may point out this is somehow linked to reading Bruker data.

Despite high failure rates we are able to import our 40-sample test dataset in the end (with up to 9 re-tries) and the final results are identical as on Windows.

I am wondering if you could help us with fixing this problem. Is this a Proteowizard issue or Wine? Should we contact proteowizard support directly? Our software engineer is eager to get involved, but would need some support and feedback to get started.

We will appreciate any help you might provide.

Marek

 
 
Matt Chambers responded:  2021-04-23 09:29

Hi Marek,

My guess is this is some interaction between Bruker's timsTOF API and Wine. Perhaps an OpenMP issue in Wine. Can you attach clean.sky? I'll try to repro with a local timsTOF file and if I can't repro I'll ask for your file.

Thanks,
-Matt

 
m j noga responded:  2021-04-23 11:08

Dear Matt,

Many thanks for your quick response. Please see the skyline document attached.

Marek

 
m j noga responded:  2021-04-30 06:06

Dear Matt,

Many thanks again for looking into our case!

In the meanwhile we did some testing internally and we see a similar exception on Skyline import after converting the file to mzML with ProteoWizard:

wine64 SkylineCmd --timestamp --memstamp --in="/data/Glycopeptide_Peak_Extraction.sky" --import-file="/data/B4Galt1_44_Slot1-07_1_3135_combined_mobility.mzML" 
[2021/04/26 17:54:14]	4	0	Opening file...
[2021/04/26 17:54:15]	11	0	File Glycopeptide_Peak_Extraction.sky opened.
[2021/04/26 17:54:15]	12	0	
[2021/04/26 17:54:15]	12	0	Adding results...
[2021/04/26 17:54:16]	13	0	1. Z:\data\B4Galt1_44_Slot1-07_1_3135_combined_mobility.mzML
[2021/04/26 17:54:16]	13	0	
[2021/04/26 17:54:22]	18	0	[1] 14%  
[2021/04/26 17:54:24]	14	0	[1] 16%  
[2021/04/26 17:54:26]	23	0	[1] 19%  
[2021/04/26 17:54:29]	34	0	[1] 22%  
[2021/04/26 17:54:31]	42	0	[1] 25%  
[2021/04/26 17:54:34]	25	0	[1] 27%  
[2021/04/26 17:54:36]	17	0	[1] 29%  
[2021/04/26 17:54:39]	15	0	[1] 32%  
[2021/04/26 17:54:42]	17	0	[1] 35%  
[2021/04/26 17:54:44]	19	0	[1] 37%  
[2021/04/26 17:54:47]	25	0	[1] 39%  
[2021/04/26 17:54:49]	33	0	[1] 41%  
[2021/04/26 17:54:52]	22	0	[1] 43%  

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Collections.Generic.GenericArraySortHelper`2.InsertionSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.Sort(TKey[] keys, TValue[] values, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items)
   at pwiz.Skyline.Util.ArrayUtil.Sort[TItem](TItem[] array, TItem[][] secondaryArrays) in Z:\pwiz\pwiz_tools\Skyline\Util\Util.cs:line 945
   at pwiz.Skyline.Model.Results.SpectraChromDataProvider.Spectra.SortSpectrum(SpectrumInfo spectrumInfo, Int32 i) in Z:\pwiz\pwiz_tools\Skyline\Model\Results\SpectraChromDataProvider.cs:line 997
   at pwiz.Common.SystemUtil.ProducerConsumerWorker`2.Consume(Object threadIndex) in Z:\pwiz\pwiz_tools\Shared\Common\SystemUtil\ProducerConsumerWorker.cs:line 185
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart(Object obj)

The file was converted using MSConvertGUI, with Write index, TPP compatibility, zlib compression and combine ion mobility.

I was also looking a bit into both raw and mzML converted file and see that MS1 spectra are stored in a special way: showing as a single spectrum but still retaining ion mobility information. It seems to me that msconvert is retaining this structure through mzML conversion, is this correct?

I do not seem to be able to perform any scan summing on MS1 data and disabling 'combine ion mobility' appear to result in msconvert unsuccessfully attempting to write enormous output files. For us summing in ion mobility dimension would serve as temporary workaround. Is it correct that these options are not really supported (yet) for this kind data file?

The only solution that seems to be helping with these errors is vastly reduce the file size, for example by selecting only narrow scan time window on mzML conversion.

I hope this information is useful. We will very much appreciate your help with addressing this issue!

Marek

 
Matt Chambers responded:  2021-07-09 09:48

Hi Marek,

Sorry for the delay responding to this. I didn't notice you had this error with mzML input, that's interesting and takes the finger off the vendor API and possibly off of Wine as well. How big are the mzML files you're working with? Can you share one (or the source .d and I can convert it myself)?

I'll be taking paternity leave soon, so I don't promise a fast response, but it'll be good to have the data to reproduce this issue. :)

Thanks,
-Matt

 
m j noga responded:  2021-07-09 15:02

Dear Matt,
Sincere congratulations! I wish you will enjoy your parental leave!
Thank you so much for looking into our problem, especially given the circumstances. We will very much appreciate if you find an opportunity to still take a look. It will be very valuable to know what might be the root cause, what could be done and who is the best contact to follow this up. We do have a stopgap workaround, but it is not fully functional.
Source .d files are about 2.5-3 GB and I've just uploaded "Controle_16_Slot1-36_1_3174.d.zip" (via https://skyline.ms/project/home/support/file sharing/begin.view?) so you can take a look. I hope this is sufficient, I don't have corresponding mzML file at the moment.
Marek

 
Brian Pratt responded:  2021-07-12 09:41

Hi Marek,

At the risk of pointing out the obvious, you don't actually need to convert to mzML - Skyline can read native mass spec formats directly. This is typically much faster than reading mzML, and of course also saves the time spent doing the conversion.

But, if for some reason mzML conversion is just part of your process, then we'll need to know exactly how you're performing that conversion in order to try to reproduce your problem. I note that you list the MSConvertGUI settings earlier in the thread, but a screenshot would be the best documentation.

Thanks

Brian Pratt

 
Matt Chambers responded:  2021-07-12 09:50

Brian he gets the same Skyline import exception in Wine/Docker with both the native format and mzML. Which tells me it's not a Bruker issue. It's definitely a Wine/pwiz/Skyline one.

 
m j noga responded:  2021-07-12 10:58

Dear Brian,

Thank you for following up!

I'm well aware that Skyline can read native formats directly. In fact, in our basic, 'manual' workflow we just work with on Windows with vendor data. This doesn't show any problems. However, trying to plug this into our data processing pipeline operating on a linux cluster creates problems. We used mzML as an intermediate step for troubleshooting purposes, based on reasoning similar to Matt's post just above (Thanks!).

I did not retain a screenshot from MSConvertGUI, see attached for my attempt to re-create it now.

However, since we performed out tests in April, I can imagine there might be differences between current version of pwiz/Skyline and one we used for testing.

Marek

 
Brian Pratt responded:  2021-07-14 11:07

Just a note to confirm that this does indeed seem to be a Wine problem - I can't provoke the error under Windows (debug nor release builds). I'm going to have to leave it to Matt to figure this one out.

  • Brian