Description

Creates a library of spectra with known peptide identifications. Typically, these identifications are done with a database search such as SEQUEST or Mascot, sometimes followed by an evaluation step such as percolator or Peptide Prophet. BlibBuild accepts files from a variety of database search programs. File formats are identified by file extension, which are given in the table below. In many cases, the peptide identification (peptide sequence, charge state and optional score) are in a separate file from the spectrum information. Unless noted, it is assumed that both files will be in the same directory.

Database searchPeptide ID file extensionSpectrum file extensionNotes
Comet / SEQUEST / Percolator.perc.xml (.sqt).cms2, .ms2Percolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory. The scores used are the q-values.
Peptide Prophet.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the probability scores.
Spectrum Mill.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the expectation scores.
OMSSA.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the expectation scores.
PEAKS DB

.pep.xml, .pep.XML, .pepXML

.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the confidence scores.
Morpheus

.pep.xml, .pep.XML, .pepXML

.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1). The scores used are the q-values.
X! Tandem.xtan.xml No separate spectrum file. The scores used are the expectation scores.
Mascot.dat No separate spectrum file. The scores used are the expectation scores (homology threshold).
Protein Pilot.group.xml No separate spectrum file. The scores used are the confidence scores.
ID Picker (Myrimatch).idpXML.mzXML, .mzMLThe name(s) of the spectrum file(s) are given in the .idpXML file. The scores used are the FDRs.
PRIDE.pride.xml No separate spectrum file.
MaxQuantmsms.txt No separate spectrum file. There need not be a . before 'msms'. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml or modification.xml can be placed in the same directory as the search results (or specified using the -x option). The scores used are the PEPs.
Proteome Discoverer.msf No separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified. The scores used are the Percolator q-values.
Scaffold.mzid.MGF, .mzXML, .mzML The scores used are the Peptide Probability scores.
ByOnic.mzid.MGF, .mzXML, .mzML The scores used are the Peptide AbsLogProb scores.
MSGF+.mzid, .pepXML.MGF, .mzXML, .mzMLThe scores used are the q-values.
MSefinal_fragment.csv There need not be a . before 'final_fragment'. The scores used are the scores in the 'score' column.
generic.ssl This generic format is provided for peptide identifications made by other means. See the file formats page for a description. The scores used are the scores in the 'score' column.

Usage

BlibBuild [options] <peptide id file>[+] <library name>

Input

  • <peptide id file> – A file containing peptide spectrum matches to be included in the library. The associated spectrum files should be in the same directory as the peptide id file but should not be given on the command line. See the above table for recognized formats. Multiple files may be listed together.
  • <library name> – The name of the library being created. An existing library may be overwriten or added to.

Output

A spectrum library in in sqlite3 format.

Options

  • -o   Overwrite existing library. Default append.
  • -s   Result file names from stdin. (e.g. ls *sqt | BlibBuild -s new.blib)
  • -C  <file size> Minimum file size required to use caching for .dat files. Specify units as B,K,G, or M. Default 800M.
  • -v   <level> Level of output to stderr (silent, error, status, warn). Default status.
  • -L   Write status and warning messages to log file.
  • -c   <cutoff score> Specify the cutoff score (0-1) below which peptide-spectrum matches will be excluded from the library.
  • -m   <size> SQLite memory cache size in Megs. Default 250M.
  • -l   <level> ZLib compression level (0-?). Default 3.
  • -i   <library_id> LSID library ID. Default uses file name.
  • -a   <authority> LSID authority. Default proteome.gs.washington.edu.
  • -x   <filename> Specify the path of XML modifications file for parsing MaxQuant files.


Search 

Pages 

previousnext
 
expand all collapse all