Thank you for sending your .mzXML file and your .pep.xml file.
I see that your .mzXML file has 2.4 million spectra in it.
It looks like BiblioSpec is spending a lot of time in a function called "getPrecursorID". When BiblioSpec sees an MS2 spectrum (or whose MS Level is any number greater than 1) in an mzXML file, BiblioSpec wants to know which MS1 spectrum it came from.
https://github.com/ProteoWizard/pwiz/blob/master/pwiz/data/msdata/SpectrumList_mzXML.cpp#L665
BiblioSpec looks at all of the spectra in the file in reverse order starting from the MS2 spectrum, trying to find the first spectrum whose MS Level is one less than the spectrum.
I think the reason that this is taking so long is that BiblioSpec finds a spectrum which says that its MS Level is "8":
<scan num="520376" msLevel="8" peaksCount="264" retentionTime="PT639.732S" activationMethod="CID" lowMz="95.51925" highMz="1698.4502" totIonCurrent="6367.0">
<precursorMz precursorCharge="2" activationMethod="CID" invK0="0.78">455.24127</precursorMz>
Because of this, BiblioSpec starts counting backwards trying to find a spectrum whose MS Level is 7. There are no MS7 spectra in the file, so BiblioSpec ends up looking at every spectrum in the file, which takes a long time.
Do you know why there are MS8 spectra in this mzXML file?
I could ask around and see if there is any way that we could speed this up. It might be that BiblioSpec should not bother trying to look for MS7 spectra. It is also possible that we could make it so that BiblioSpec does not bother trying to figure out the parent scan number since I don't think that information is necessary when building a spectral library.
Things would definitely be faster if you were able to use the mzML format instead of mzXML.
-- Nick