Description

Creates a library of spectra with known peptide and/or small molecule identifications. Typically, these identifications are done with a database search such as SEQUEST or Mascot, sometimes followed by an evaluation step such as percolator or Peptide Prophet. BlibBuild accepts files from a variety of database search programs, as well as some other spectral library formats. File formats are identified by file extension, which are given in the table below. In many cases, the peptide identification (peptide sequence, charge state and optional score) are in a separate file from the spectrum information. Unless noted, it is assumed that both files will be in the same directory.

Database searchPeptide ID file extensionSpectrum file extension
*RAW includes vendor formats like RAW, WIFF, .D, etc.
Score UsedNotes
Generic SSL.ssl score columnA generic format for encoding spectrum library entries.
ByOnic.mzid.MGF, .mzXML, .mzMLAbsLogProb 
Comet/SEQUEST/Percolator.perc.xml, .sqt.cms2, .ms2, .mzXMLq-valuePercolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory.
DIA-NN.speclib noneNo separate spectrum file. In the current implementation, no score is imported from the library, so all spectra are imported.
IDPicker.idpXML.mzXML, .mzMLFDRThe name(s) of the spectrum file(s) are given in the .idpXML file.
MS Amanda.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*q-value 
MSFragger.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*q-value 
MSGF+.mzid, .pepXML.mzML, .mzXML, .MGF, RAW*expectation value 
Mascot.dat expectation valueNo separate spectrum file.
MaxQuant Andromedamsms.txt + evidence.txt + mqpar.xml + modifications.xml.mzML, .mzXML, .MGF, RAW*PEPIt is possible to use peaks embedded in the msms.txt, but external spectra files are preferred because the embedded peaks are charge deconvoluted. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml, modifications.local.xml, or modification.xml can be placed in the same directory as the search results (or specified using the -x option).
Morpheus.pep.xml, .pepXML.mzXML, .mzMLq-valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1).
OMSSA.pep.xml, .pepXML.mzXML, .mzMLexpectation valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
OpenSWATH.tsv m_score columnNo separate spectrum file.
PEAKS DB.pep.xml, .pepXML.mzXML, .mzMLconfidence scoreThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PLGS MSefinal_fragment.csv score columnThere need not be a . before 'final_fragment'..
PRIDE.pride.xml variousNo separate spectrum file.
PeptideProphet/iProphet.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*probability scoreThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PeptideShaker.mzid.MGFconfidence score 
Protein Pilot.group.xml confidence scoreNo separate spectrum file.
Protein Prospector.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*expectation value 
Proteome Discoverer.msf, .pdResult q-valueNo separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified.
Proxl XML.proxl.xml.mzML, .mzXML, .MGF, RAW*q-value 
Scaffold.mzid.MGF, .mzXML, .mzMLpeptide probability 
Spectronaut.csv noneSpectronaut Assay Library export. No separate spectrum file.
Spectrum Mill.pep.xml, .pepXML.mzXML, .mzMLexpectation valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
X! Tandem.xtan.xml expectation valueNo separate spectrum file.



Usage

BlibBuild [options] <peptide id file>[+] <library name>

Input

  • <peptide id file> – A file containing peptide spectrum matches to be included in the library. The associated spectrum files should be in the same directory as the peptide id file but should not be given on the command line. See the above table for recognized formats. Multiple files may be listed together.
  • <library name> – The name of the library being created. An existing library may be overwriten or added to.

Output

A spectrum library in in sqlite3 format.

Options

  • -o Overwrite existing library. Default append.
  • -S <filename> Read from file as though it were stdin.
  • -s Result file names from stdin. e.g. ls *sqt | BlibBuild -s new.blib.
  • -u Ignore peptides except those with the unmodified sequences from stdin.
  • -U Ignore peptides except those with the modified sequences from stdin.
  • -H Use more than one decimal place when describing mass modifications.
  • -C <file size> Minimum file size required to use caching for .dat files. Specifiy units as B,K,G or M. Default 800M.
  • -c <cutoff> Score threshold (0-1) for PSMs to be included in library. Higher threshold is more exclusive.
  • -v <level> Level of output to stderr (silent, error, status, warn). Default status.
  • -L Write status and warning messages to log file.
  • -m <size> SQLite memory cache size in Megs. Default 250M.
  • -l <level> ZLib compression level (0-?). Default 3.
  • -i <library_id> LSID library ID. Default uses file name.
  • -a <authority> LSID authority. Default proteome.gs.washington.edu.
  • -x <filename> Specify the path of XML modifications file for parsing MaxQuant files.
  • -p <filename> Specify the path of XML parameters file for parsing MaxQuant files.
  • -P <float> Specify pusher interval for Waters final_fragment.csv files.
  • -d [<filename>] Document the .blib format by writing SQLite commands to a file, or stdout if no filename is given.
  • -E Prefer reading peaks from embedded spectra (currently only affects MaxQuant msms.txt)
  • -A Output messages noting ambiguously matched spectra (spectra matched to multiple peptides)
  • -K Keep ambiguously matched spectra