How to improve BlibBuild CPU usage

support
How to improve BlibBuild CPU usage nibarrola  2022-06-22 05:25
 

Dear Skyline Team,

We are trying to create a Skyline library using a .pep.xml file (3.31MB) generated with PEAKS, the related Spectrum file .mzXML has a size of 928MB. It takes more than 1h to create the .blib file. In case of bigger files it takes all the night. We have noticed that BlibBuild is using only one CPU, so we want to know if it is possible to improve processing speed incrementing the number of CPUs used by BlibBuild.exe which is the bottleneck.

We have installed SkylineDaily in two computers and we have the same problem in both:
-Computer 1: processor: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz 3.60 GHz, RAM: 64,0 GB (63,9 GB usable), Windows 10 Pro 64-bit operating system, sockets: 1, cores:8, logical processors: 16
-Computer 2: Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz 2.20 GHz (2 processors), RAM: 512 GB (511 GB usable), Windows 10 Pro 64-bit operating, system, sockets: 2, cores:24, logical processors: 48

Thanks in advance for your help

Nieves Ibarrola

 
 
Nick Shulman responded:  2022-06-22 07:47

Can you send us your files? Building a spectral library is not supposed to take that long, even though, yes, BiblioSpec uses only one thread to do all of its work.
We have had support requests in the past where BiblioSpec was incredibly slow because for each spectrum it was doing a linear search through the entire list of spectra in the mzML file trying to find the spectrum whose ID's matched.

You can upload your pepxml and mzxml files here:
https://skyline.ms/files.url

-- Nick

 
nibarrola responded:  2022-06-23 02:24

I have already uploaded the files. The file name is 22062022_NievesIbarrola.zip

Thanks!

 
Nick Shulman responded:  2022-06-23 08:36

Thank you for sending your .mzXML file and your .pep.xml file.
I see that your .mzXML file has 2.4 million spectra in it.

It looks like BiblioSpec is spending a lot of time in a function called "getPrecursorID". When BiblioSpec sees an MS2 spectrum (or whose MS Level is any number greater than 1) in an mzXML file, BiblioSpec wants to know which MS1 spectrum it came from.
https://github.com/ProteoWizard/pwiz/blob/master/pwiz/data/msdata/SpectrumList_mzXML.cpp#L665
BiblioSpec looks at all of the spectra in the file in reverse order starting from the MS2 spectrum, trying to find the first spectrum whose MS Level is one less than the spectrum.

I think the reason that this is taking so long is that BiblioSpec finds a spectrum which says that its MS Level is "8":

    <scan num="520376" msLevel="8" peaksCount="264" retentionTime="PT639.732S" activationMethod="CID" lowMz="95.51925" highMz="1698.4502" totIonCurrent="6367.0">
      <precursorMz precursorCharge="2" activationMethod="CID" invK0="0.78">455.24127</precursorMz>

Because of this, BiblioSpec starts counting backwards trying to find a spectrum whose MS Level is 7. There are no MS7 spectra in the file, so BiblioSpec ends up looking at every spectrum in the file, which takes a long time.

Do you know why there are MS8 spectra in this mzXML file?

I could ask around and see if there is any way that we could speed this up. It might be that BiblioSpec should not bother trying to look for MS7 spectra. It is also possible that we could make it so that BiblioSpec does not bother trying to figure out the parent scan number since I don't think that information is necessary when building a spectral library.

Things would definitely be faster if you were able to use the mzML format instead of mzXML.
-- Nick

 
nibarrola responded:  2022-06-24 04:48

Hi Nick,
We have built the library using the MSFragger files and Skyline was really fast doing the task. So the problem must be with the PEAKS' files. We don't know why there are MS8 spectra in the files, so we will try to figure it out.

Thank you very much for your help, it was really usefull.

Nieves

 
Juan C. Rojas E. responded:  2022-06-29 06:36

Hi Nieves,

I am facing the same issue. Based on the invK0 value reported it is safe to assume your data is from a timsTOF instrument and I would go as far as thinking it was a DDA-PASEF experiment. Although quite strange to call it MS8 would it be possible that it is referring to the MSMS scan acquired from a TIMS ramp cycle?

I have worked with with exports before from Waters IMS-DDA files and it was fine. It ran even faster if you did centroiding of the .mzXML files prior to building the library.

A work around I did before was to work with the Scaffold export (.mzid and .mgf files). BiblioSpec ran faster through those. But as of today with Skyline-daily 21.2.1.539 I am getting an error I don't understand (attached) that does not allow me to build the library anymore. Any ideas for this?

If you figure out with BSI why this problem is happening please do share!
Sincerely,
Juan C. Rojas E.

 
Matt Chambers responded:  2022-06-29 12:40

Hi Juan, thanks for the information about your Blib workaround. Please make a separate thread for the different issue. That one appears to be UNIMOD related.

 
Juan C. Rojas E. responded:  2022-07-15 07:17

I just wanted to add a warning for the Blib workaround that I noticed today.

The MS/MS spectra in the .mgf exports have been charge deconvoluted which do no represent that raw data. It is still useful for peptide ID and would work anyway for DDA workflows.

However, if the aim is to build DDA spectral libraries for DIA data analysis this approach might not be the best since intense multiply charged ions (which have been charge deconvoluted and summed up to singly charged ions) would be missing for selection as representative fragment ions and the idotp from raw to spectral library might be distorted.

Sincerely,
Juan C.