Description

Creates a library of spectra with known peptide and/or small molecule identifications. Typically, these identifications are done with a database search such as SEQUEST or Mascot, sometimes followed by an evaluation step such as percolator or Peptide Prophet. BlibBuild accepts files from a variety of database search programs, as well as some other spectral library formats. File formats are identified by file extension, which are given in the table below. In many cases, the peptide identification (peptide sequence, charge state and optional score) are in a separate file from the spectrum information. Unless noted, it is assumed that both files will be in the same directory.

Database search Peptide ID file extension Spectrum file extension
*RAW includes vendor formats like RAW, WIFF, .D, etc.
Score Used Notes
Generic SSL .ssl   score column A generic format for encoding spectrum library entries.
ByOnic .mzid .MGF, .mzXML, .mzML AbsLogProb  
Comet/SEQUEST/Percolator .perc.xml, .sqt .cms2, .ms2, .mzXML q-value Percolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory.
DIA-NN .speclib   none No separate spectrum file. In the current implementation, no score is imported from the library, so all spectra are imported.
IDPicker .idpXML .mzXML, .mzML FDR The name(s) of the spectrum file(s) are given in the .idpXML file.
MS Amanda .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* q-value  
MSFragger .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* q-value  
MSGF+ .mzid, .pepXML .mzML, .mzXML, .MGF, RAW* expectation value  
Mascot .dat   expectation value No separate spectrum file.
MaxQuant Andromeda msms.txt + evidence.txt + mqpar.xml + modifications.xml .mzML, .mzXML, .MGF, RAW* PEP It is possible to use peaks embedded in the msms.txt, but external spectra files are preferred because the embedded peaks are charge deconvoluted. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml, modifications.local.xml, or modification.xml can be placed in the same directory as the search results (or specified using the -x option).
Morpheus .pep.xml, .pepXML .mzXML, .mzML q-value The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1).
OMSSA .pep.xml, .pepXML .mzXML, .mzML expectation value The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
OpenSWATH .tsv   m_score column No separate spectrum file.
PEAKS DB .pep.xml, .pepXML .mzXML, .mzML confidence score The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PLGS MSe final_fragment.csv   score column There need not be a . before 'final_fragment'..
PRIDE .pride.xml   various No separate spectrum file.
PeptideProphet/iProphet .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* probability score The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PeptideShaker .mzid .MGF confidence score  
Protein Pilot .group.xml   confidence score No separate spectrum file.
Protein Prospector .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* expectation value  
Proteome Discoverer .msf, .pdResult   q-value No separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified.
Proxl XML .proxl.xml .mzML, .mzXML, .MGF, RAW* q-value  
Scaffold .mzid .MGF, .mzXML, .mzML peptide probability  
Spectronaut .csv   none Spectronaut Assay Library export. No separate spectrum file.
Spectrum Mill .pep.xml, .pepXML .mzXML, .mzML expectation value The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
X! Tandem .xtan.xml   expectation value No separate spectrum file.

Usage

BlibBuild [options] <peptide id file>[+] <library name>

Input

  • <peptide id file> – A file containing peptide spectrum matches to be included in the library. The associated spectrum files should be in the same directory as the peptide id file but should not be given on the command line. See the above table for recognized formats. Multiple files may be listed together.
  • <library name> – The name of the library being created. An existing library may be overwriten or added to.

Output

A spectrum library in in sqlite3 format.

Options

  • -o Overwrite existing library. Default append.
  • -S <filename> Read from file as though it were stdin.
  • -s Result file names from stdin. e.g. ls *sqt | BlibBuild -s new.blib.
  • -u Ignore peptides except those with the unmodified sequences from stdin.
  • -U Ignore peptides except those with the modified sequences from stdin.
  • -H Use more than one decimal place when describing mass modifications.
  • -C <file size> Minimum file size required to use caching for .dat files. Specifiy units as B,K,G or M. Default 800M.
  • -c <cutoff> Score threshold (0-1) for PSMs to be included in library. Higher threshold is more exclusive.
  • -v <level> Level of output to stderr (silent, error, status, warn). Default status.
  • -L Write status and warning messages to log file.
  • -m <size> SQLite memory cache size in Megs. Default 250M.
  • -l <level> ZLib compression level (0-?). Default 3.
  • -i <library_id> LSID library ID. Default uses file name.
  • -a <authority> LSID authority. Default proteome.gs.washington.edu.
  • -x <filename> Specify the path of XML modifications file for parsing MaxQuant files.
  • -p <filename> Specify the path of XML parameters file for parsing MaxQuant files.
  • -P <float> Specify pusher interval for Waters final_fragment.csv files.
  • -d [<filename>] Document the .blib format by writing SQLite commands to a file, or stdout if no filename is given.
  • -E Prefer reading peaks from embedded spectra (currently only affects MaxQuant msms.txt)
  • -A Output messages noting ambiguously matched spectra (spectra matched to multiple peptides)
  • -K Keep ambiguously matched spectra