Hardware recommendation for Skyline workstation

support
Hardware recommendation for Skyline workstation msylvest  2016-12-12 07:27
 
Dear Skyline team,
sorry if the answer is out there but I couldn't find it anywhere on the site / support files / google:
We want to order a dedicated workstation for Skyline processing. What are your hardware recommendations? Most importantly: How well does the performance scale with cores / sockets? How important is single thread performance? RAM requirements? What is the percentage of system drive vs. data drive usage (SSD for system only or for both?)

Thanks in advance
Marc Sylvester

Core Facility Mass Spectrometry,
University of Bonn, Germany
 
 
Brian Pratt responded:  2016-12-13 07:52
Hi Marc,

I was hoping Brendan would weigh in on this, as he as studied it closely, but he is travelling at the moment.

Skyline is highly threaded, so multi core is beneficial. RAM requirements depend, of course, on the size of your Skyline documents. SSD is highly desirable for raw data access, and yes, for the OS as well.

For what it's worth, just today I took delivery of this machine for my work on Skyline, on the recommendation of another Skyline developer:

https://amzn.com/B01DUVEZ6K (no, I don't get a sales commission!)
Dell XPS 8900 Desktop - Intel Core i7-6700 6th Generation Quad-Core Skylake up to 4.0 GHz, 64GB DDR4 Memory, 1TB SSD + 4TB SATA Hard Drive, 2GB Nvidia GeForce GT 730, DVD Burner, Windows 10

I will be keeping the OS and the data I am currently working with on the SSD.

I hope this helps, and perhaps Brendan can weigh in later with any further thoughts.

Cheers,

Brian Pratt
 
Brendan MacLean responded:  2016-12-13 13:33
Hi Marc,
Yeah, the computer Brian mentions is currently our best configuration. I wish I had one. But for now, I will stay with my prior purchase, which all have i7 processors around 3.6 GHz, 16-32 GB RAM, 500 GB SSD and 1-2 TB SATA HD.

That should give you a range. How much you need to push these spec's depends somewhat on how much you want to push the size of your transition by replicate by chromatogram time range matrix. All of these systems would be more than enough for SRM and PRM applications. It is when you really push the number of targets and replicates for DIA and/or MS1 extraction from DDA data that you need to think about extra memory and processing power.

Hope this helps. Thanks to Brian for the detailed and timely answer.

--Brendan
 
msylvest responded:  2016-12-13 13:50
Thank you Brian and Brendan for your detailed and quick answers!
Cheers
Marc
 
moe responded:  2016-12-15 19:38
Hi Marc,

for your interest, I'm a beginner user of skyline and have attached some screenshots (my current laptop specs). These specs are not quite enough for analysis of a DIA/SWATH data set - It works but it's very slow.

E.g. In my current skyline document I imported 150 .mz5 files (converted from .wiff files) with 1500 proteins 20,000 peptides and 170,000 transitions as targets. Opening/saving my file (63 GB size) takes about 10 minutes and training of peak scoring models takes about 25-30 minutes using just decoys and days when you use second best peaks and decoys combined.

cheers
Moe
 
Brendan MacLean responded:  2016-12-16 13:47
Hi Moe,
Wow, great use case. Which version of Skyline are you using? I just did a bit more performance work on Skyline-daily which improved document open time for a similar size document here go from 11 minutes to 3 minutes. Not instant, but these are quite large. Definitely the training time seems like good justification for us to work more on that. We currently do 30 training iterations just to be hyper sure the training converges, but I am pretty sure others stop closer to 10. We have talked about finding a metric for determining when the model converges, so that we can cut down on the number of iterations, which could provide a 2-3x improvement on the time required from current.

The 63 GB file is the SKYD file? Or the .sky file? How many minutes of chromatograms are you storing? Is that 500 GB disk drive SSD or HDD?

The systems we have been recommending are also closer to 4.0 GHz, while yours is 2.5 Gz which would make some difference, also 64 GB RAM vs. 16 GB RAM would likely make a difference for this size file.

Thanks for your post. It would be great to see if we can find any ways of improving Skyline performance for your use case. Let us know if the file open performance you are seeing is with Skyline 3.6 or Skyline-daily 3.6.1. Thanks.

--Brendan
 
moe responded:  2016-12-18 20:39
Hi Brendan and Brian,

I'm using Skyline-daily 3.6.1
The skyd file is ~67 GB and the sky file is ~9 GB.
120 min of chromatograms (DIA).
500GB HDD (unfortunately)

We are just looking into buying a better computer (a workstation) for DIA analysis (and other analyses), but need to conscious of cost/benefit. How big is the benefit for skyline speed with regard to following:

1) 12-16x intel xeon (2.5-3 GHZ) vs intel quadcore i7 (4 GHz)
2) RAM 64 GB vs 128 GB
3) 0.5 vs 1 TB SSD?
 
Brian Pratt responded:  2016-12-19 14:54
If I had to choose one of the three, I'd go for the larger SSD for faster file reads during chromatogram extraction.
 
Brendan MacLean responded:  2016-12-19 15:38
I would go for the i7 at 4 Ghz. In my own extensive testing, this jump has proved to be around twice as fast for imports. It is definitely not simply proportional to the number of GHz, i.e. 3 Ghz to 4 Ghz is not just a 33% gain. I did quite a bit of testing on a 24x intel xeon 2.5 GHz versus my i7 3.5 GHz desktop, and found my desktop around 2x faster using the same number of threads. Involving more threads (for parallel file import) could give the 24-core machine a bit of an edge, but this was tricky, mostly (I think) due to added garbage collection overhead. I was able to get over 2x as fast with 24-cores, but I had to use SkylineRunner command-line and multiple processes. Single process, multi-threaded peaked at just a percentage faster. The 24 core (48 thread) xeon machine also has 192 GB RAM and the fastest 750 GB SSD of all the machines I tested with CrystalDiskMark.

-----------------------------------------------------------------------
CrystalDiskMark 5.1.0 x64 (C) 2007-2015 hiyohiyo
                           Crystal Dew World : http://crystalmark.info/
-----------------------------------------------------------------------
* MB/s = 1,000,000 bytes/s [SATA/600 = 600,000,000 bytes/s]
* KB = 1000 bytes, KiB = 1024 bytes

   Sequential Read (Q= 32,T= 1) : 2891.801 MB/s
  Sequential Write (Q= 32,T= 1) : 1415.197 MB/s
  Random Read 4KiB (Q= 32,T= 1) : 687.117 MB/s [167753.2 IOPS]
 Random Write 4KiB (Q= 32,T= 1) : 636.178 MB/s [155316.9 IOPS]
         Sequential Read (T= 1) : 1232.987 MB/s
        Sequential Write (T= 1) : 1339.985 MB/s
   Random Read 4KiB (Q= 1,T= 1) : 40.323 MB/s [ 9844.5 IOPS]
  Random Write 4KiB (Q= 1,T= 1) : 84.988 MB/s [ 20749.0 IOPS]

  Test : 1024 MiB [D: 37.4% (279.1/745.2 GiB)] (x5) [Interval=5 sec]
  Date : 2016/12/19 15:27:09
    OS : Windows Server 2012 R2 Server Standard (full installation) [6.3 Build 9600] (x64)
  

The 128 GB RAM will give you a higher ceiling for memory use, but you'll need some really monster-sized files to need it. Nick was saying that your current file with 120 replicates uses around 20 GB on his 64 GB machine. So, it seems understandable that processing would go very, very slowly on your 16 GB machine because the OS will need to do a lot of swapping to disk to manage this. As long as you have enough memory to avoid forcing the OS thrash on the disk, you should be fine. In this case 64 GB versus 128 GB shouldn't make any difference. If you are thinking you would like to work on things 4x this size, then getting 128 GB may be a good idea.

On the SSD, actually, most of the time SSD versus HDD makes only a 10-20% difference in performance, and I process a lot of our data on HDD. We have worked very hard to make this the case for Skyline. But, occasionally, we do run into cases where SSD versus HDD makes more difference, e.g. I recently found an issue where the Thermo reader did much worse for DIA files on HDD, but that turned out to be only for Windows 7 and an external HDD (connected through USB 3). The problem returned to the normal 10% difference for either an internal HDD or Windows 10.

So, you shouldn't necessarily feel as if you need to store all data files on SSD which you are going to process. Run some of your own comparisons to see what kind of difference you see.

If it were me, I would say:
1. 4 Ghz i7 (too expensive to go NUMA, and too tricky to get better performance; easy to see worse, if you aren't careful)
2. Your choice of either 128 GB RAM or 1 TB SSD. Neither may give you immediate benefit but act more as insurance for future growth. Myself, I would probably get a 4 TB HDD and expect to do most of my processing from it, and take the RAM, as reconfiguring a system with 64 GB to 128 GB seems not as easy as getting more hard drive space, unless you know the machine has extra slots for adding memory later.

Those are my thoughts anyway.

--Brendan