Skyline runner - DIA

support
Skyline runner - DIA bart van puyvelde  2018-05-23 05:28
 

Dear Brendan,

We are currently testing SkylineRunner as we are more and more confronted with large DIA/SWATH datasets. Last weekend, one of our data pc's crashed when it was importing multiple .wiff files (using a very large library with a few million transitions)
Specs: Intel Xeon, CPU E5-2640 v3 2.6 GHz (160 GB ram/Windows 10 Enterprise)

I have been testing SkylineRunner on a rather simple dataset (files attached). The user interface needs approximately 10 seconds to import the .wiff file. Though when using SkylineRunner, the command prompt never seems to import the data (screenshot of command prompt attached)

Probably we are doing something wrong, as I am not an expert in writing .bat files.
Is there a command making it possible for us to follow the progress during import?

If extra information/data is required, please let me know.

Kind Regards,
Bart

 
 
Brendan MacLean responded:  2018-05-23 08:04

Hi Bart,
I will have a closer look soon, but let me suggest first that you try simply executing this all as a single command-line to SkylineRunner. I have to admit, I have never used batch mode or --batch-commands (as you appear to be using), and it is not at all necessary for the operations in your .bat file. All of that can be executed in a single command line. Though, I am not entirely sure why it would get hung up in batch mode.

Also, when you provide a Skyline document to us for support, please use File > Share - Complete to create a .sky.zip file, instead of posting the component parts, which it doesn't look like you quite got all of.

Finally, it might also be informative to give the actual command-line you used when calling SkylineRunner with your .bat file. I was actually expecting the .bat file to be a Windows batch file with command-line calls in it and was surprised to find only the 3 lines of Skyline arguments. I would suggest you give your Skyline batch-mode files a different extension (like .sbat or .skyb).

Let me know how the single command-line goes, and I will have a look at trying an import in batch-mode myself.

--Brendan

 
bart van puyvelde responded:  2018-05-25 01:52

Hi Brendan,

First, I would like to thank you for the quick response.

I have followed your instructions and I was capable of running the SkylineRunner.exe.
Still when I performed the data import of last weekend, Skyline crashed again after 35% of import (approximately 24 hours of running with Memory consumption of >148 GB).
Is there any way to resolve this issue? Extending the RAM memory? I presume this will not change a lot as we already have 160 GB.

We have one "super"computer running on Linux (with RAM memory of more than 500 GB). I have searched the support page and saw that already a few people asked about the possibility to run Skyline with Linux. Any update on this ?

Kind regards,
Bart

 
Brendan MacLean responded:  2018-05-25 06:49

Hi Bart,
How many transitions are you trying to extract? I have recently performed this operation on 2 million transitions over 3 files on a 16 GB computer, and in the past I have gone as high as 6.5 million transitions over 20 files on a 176 GB computer.

So, either you have gone even higher than those trials, or there is just something problematic with your settings. Are you using iRT for retention time prediction or might you be doing full-gradient extraction over a long gradient?

When you say Skyline "crashed", do you get any kind of error message you can share?

Usually, I just stop trying when I hit my computer's memory limit because the processing will start thrashing as the system attempts to swap memory to disk and back causing "thrashing". I generally don't wait the 24 hours you describe, because once the computer is thrashing it is just going to take too long. Though, I'll admit on the 2 million transition runs on my 16 GB computer I did see some non-linear scaling and near total use of memory, but my processing never took longer than 2 hours. Though, it was more like 20 minutes in cases where I wasn't pushing the memory limit.

I will need a bit more information to help you remedy your situation. You can also --memstamp on the command-line to track your memory use along with --timesamp to track the time. If you post a log with these values, that can be very helpful in understanding the performance you are seeing.

--Brendan

 
Brendan MacLean responded:  2018-05-25 06:51

I assumed that you were not talking about the simple data set that takes 10 seconds to import in the UI. If you are seeing that take 24 hours and then crash at only 35%, then something is really wrong.

 
bart van puyvelde responded:  2018-05-28 08:08

Hi Brendan,

The amount of transitions we are trying to extract is around 20 million, which is a lot, I know.

I am using the SSRCalc retention time predictor for retention time filtering. When I select "Include all matching scans", Skyline can import the run. I presume, adding the retention time predictor is causing the issue.

The runs I am talking about are indeed long gradient runs.

I have added a print screen of the error. The memory use was around 148 GB (Max.Capacity of RAM) when Skyline crashed.

Kind regards,
Bart

 
Brendan MacLean responded:  2018-05-28 08:50

I admit I have never seen a document with 20 million transitions, a long gradient and trying to use SSRCalc to predict retention time. Sounds like a poor combination. So, while I find the error interesting and I suspect it is fixable, I am doubtful how much useful data you are going to get out of your current approach.

Here are some suggestions for how to lower your memory needs and work within the current limitations:

  1. Process your data in batches, maybe 4 batches of 5 million transitions using 4 Skyline documents. There is no rule saying all of your targets or all of your runs must be in a single Skyline file.
  2. As quickly as possible move to using more accurate retention time prediction (iRT) and narrower extraction windows.
  3. Do only fragment ion extraction (i.e. no MS1 extraction). This will half the number of chromatograms extracted for not much penalty and you can look at MS1 later when you have narrowed your list.
  4. Use fewer decoys. Even with the Aebersold lab's Pan Human library, I found I got very similar results using 1/4 as many decoys as targets. In theory, the number of decoys shouldn't matter, as long as it is enough to give you an accurate representation of the null. At least I am sure Skyline is doing the math correctly, and when you are using very large target lists, it is not necessary to use the same number of decoys.

Your processing time and memory (and disk space) needs will correlate with the number of chromatogram points you are extracting, which will be (transition count * run count * extraction time range). If you can't control any of these factors, then you will need to process your data in manageable batches, if you want to use Skyline.

Thanks for your help in diagnosing the issue.

--Brendan

 
Brendan MacLean responded:  2018-05-28 09:15

I should also ask how many peptides you are targeting with your 20 million transitions and whether you have a spectral library for them. Since the largest libraries I have seen so far cover in the 150,000 peptide range, yours would have to be around 10 times that size, if you are following the method suggested in the SWATH paper, and used in many other papers since, of targeting 6 transitions found in the library spectrum.

But, since you are attempting to use SSRCalc, I am a little wondering whether you might be trying to do all of this without any prior knowledge, as in a normal DDA peptide search and just targeting a much larger set of transitions per peptide.

I am at least curious what your settings are to end up with 20 million transitions. I only arrived at 6.5 million myself by using many, many more decoys than targets for the Pan Human library.