Problem importing Brucker Tims-TOF Pro data in docker container

support
Problem importing Brucker Tims-TOF Pro data in docker container m j noga  2021-04-23
 

Dear Skyline Support team,

With a larger team of bioinformaticians and developers we are prototyping data processing and analysis workflow based on Skyline. In one of the use cases we performing targeted MS1 feature extraction using from the TimsTOF Pro data. We already successfully tested the workflow on Windows, both via GUI and command line and we are right now at the stage of transferring it to containerized environment on our Linux cluster.

As a first stage we are just using official docker image (https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses), but we experiencing a high number of import failures with the following exception:

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Collections.Generic.GenericArraySortHelper`2.Swap(TKey[] keys, TValue[] values, Int32 i, Int32 j)
   at System.Collections.Generic.GenericArraySortHelper`2.PickPivotAndPartition(TKey[] keys, TValue[] values, Int32 lo, Int32 hi)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.Sort(TKey[] keys, TValue[] values, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items)
   at pwiz.Skyline.Util.ArrayUtil.Sort[TItem](TItem[] array, TItem[][] secondaryArrays) in Z:\pwiz\pwiz_tools\Skyline\Util\Util.cs:line 945
   at pwiz.Skyline.Model.Results.SpectraChromDataProvider.Spectra.SortSpectrum(SpectrumInfo spectrumInfo, Int32 i) in Z:\pwiz\pwiz_tools\Skyline\Model\Results\SpectraChromDataProvider.cs:line 997
   at pwiz.Common.SystemUtil.ProducerConsumerWorker`2.Consume(Object threadIndex) in Z:\pwiz\pwiz_tools\Shared\Common\SystemUtil\ProducerConsumerWorker.cs:line 185
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart(Object obj) 

When attempting to import a single file multiple times it fails 30-40% of the cases, at random stage during import (see example import log attached).
We invoke Skyline by:

docker run -it --rm -v /Users/m/surfdrive2/Glycopeptides/data:/data \
proteowizard/pwiz-skyline-i-agree-to-the-vendor-licenses wine SkylineCmd --dir=/data \
--in=output/clean.sky --out=output/with_data.sky --timestamp --memstamp \
--import-file=Controle_16_Slot1-36_1_3174.d --full-scan-precursor-res=40 \
--log-file=output/import_log.txt > data/output/import_console_log.txt 

Setting up options like '--import-process-count' or '--import-threads' doesn't seem to have any influence.

We already tested this using docker on Linux and OS X hosts with similar error rates.

We do not see this issue when running the same workflow on Agilent Q-ToF data, which may point out this is somehow linked to reading Bruker data.

Despite high failure rates we are able to import our 40-sample test dataset in the end (with up to 9 re-tries) and the final results are identical as on Windows.

I am wondering if you could help us with fixing this problem. Is this a Proteowizard issue or Wine? Should we contact proteowizard support directly? Our software engineer is eager to get involved, but would need some support and feedback to get started.

We will appreciate any help you might provide.

Marek

 
 
matt.chambers42 responded:  2021-04-23

Hi Marek,

My guess is this is some interaction between Bruker's timsTOF API and Wine. Perhaps an OpenMP issue in Wine. Can you attach clean.sky? I'll try to repro with a local timsTOF file and if I can't repro I'll ask for your file.

Thanks,
-Matt

 
m j noga responded:  2021-04-23

Dear Matt,

Many thanks for your quick response. Please see the skyline document attached.

Marek

 
m j noga responded:  2021-04-30

Dear Matt,

Many thanks again for looking into our case!

In the meanwhile we did some testing internally and we see a similar exception on Skyline import after converting the file to mzML with ProteoWizard:

wine64 SkylineCmd --timestamp --memstamp --in="/data/Glycopeptide_Peak_Extraction.sky" --import-file="/data/B4Galt1_44_Slot1-07_1_3135_combined_mobility.mzML" 
[2021/04/26 17:54:14]   4       0       Opening file...
[2021/04/26 17:54:15]   11      0       File Glycopeptide_Peak_Extraction.sky opened.
[2021/04/26 17:54:15]   12      0       
[2021/04/26 17:54:15]   12      0       Adding results...
[2021/04/26 17:54:16]   13      0       1. Z:\data\B4Galt1_44_Slot1-07_1_3135_combined_mobility.mzML
[2021/04/26 17:54:16]   13      0       
[2021/04/26 17:54:22]   18      0       [1] 14%  
[2021/04/26 17:54:24]   14      0       [1] 16%  
[2021/04/26 17:54:26]   23      0       [1] 19%  
[2021/04/26 17:54:29]   34      0       [1] 22%  
[2021/04/26 17:54:31]   42      0       [1] 25%  
[2021/04/26 17:54:34]   25      0       [1] 27%  
[2021/04/26 17:54:36]   17      0       [1] 29%  
[2021/04/26 17:54:39]   15      0       [1] 32%  
[2021/04/26 17:54:42]   17      0       [1] 35%  
[2021/04/26 17:54:44]   19      0       [1] 37%  
[2021/04/26 17:54:47]   25      0       [1] 39%  
[2021/04/26 17:54:49]   33      0       [1] 41%  
[2021/04/26 17:54:52]   22      0       [1] 43%  

Unhandled Exception: System.AccessViolationException: Attempted to read or write protected memory. This is often an indication that other memory is corrupt.
   at System.Collections.Generic.GenericArraySortHelper`2.InsertionSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.IntroSort(TKey[] keys, TValue[] values, Int32 lo, Int32 hi, Int32 depthLimit)
   at System.Collections.Generic.GenericArraySortHelper`2.Sort(TKey[] keys, TValue[] values, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items, Int32 index, Int32 length, IComparer`1 comparer)
   at System.Array.Sort[TKey,TValue](TKey[] keys, TValue[] items)
   at pwiz.Skyline.Util.ArrayUtil.Sort[TItem](TItem[] array, TItem[][] secondaryArrays) in Z:\pwiz\pwiz_tools\Skyline\Util\Util.cs:line 945
   at pwiz.Skyline.Model.Results.SpectraChromDataProvider.Spectra.SortSpectrum(SpectrumInfo spectrumInfo, Int32 i) in Z:\pwiz\pwiz_tools\Skyline\Model\Results\SpectraChromDataProvider.cs:line 997
   at pwiz.Common.SystemUtil.ProducerConsumerWorker`2.Consume(Object threadIndex) in Z:\pwiz\pwiz_tools\Shared\Common\SystemUtil\ProducerConsumerWorker.cs:line 185
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart(Object obj)

The file was converted using MSConvertGUI, with Write index, TPP compatibility, zlib compression and combine ion mobility.

I was also looking a bit into both raw and mzML converted file and see that MS1 spectra are stored in a special way: showing as a single spectrum but still retaining ion mobility information. It seems to me that msconvert is retaining this structure through mzML conversion, is this correct?

I do not seem to be able to perform any scan summing on MS1 data and disabling 'combine ion mobility' appear to result in msconvert unsuccessfully attempting to write enormous output files. For us summing in ion mobility dimension would serve as temporary workaround. Is it correct that these options are not really supported (yet) for this kind data file?

The only solution that seems to be helping with these errors is vastly reduce the file size, for example by selecting only narrow scan time window on mzML conversion.

I hope this information is useful. We will very much appreciate your help with addressing this issue!

Marek