Table of Contents

guest
2018-12-19
BiblioSpec Spectral Library Tools
   BlibBuild
   BlibFilter
   BlibSearch
   BlibToMS2
   LibToSqlite3

BiblioSpec Spectral Library Tools


BiblioSpec is a suite of software tools for creating and searching MS/MS peptide spectrum libraries.

New in version 2.0

BiblioSpec 2.0 stores spectrum libraries as sqlite3 files. Sqlite3 is a light-weight, open-source database format which can be read and manipulated with any sqlite3 tools in addition to BiblioSpec. For more information about the library format, see the file formats page. The new format is a departure from version 1.0 which uses a unique binary format. This means that tools and libraries from the two versions are not compatible. There is, however, a conversion tool for turning a version 1.0 library into a sqlite3 library.

BiblioSpec components

The BiblioSpec package contains the following programs:

  • BlibBuild creates a library of peptide MS/MS spectra from a variety of different database search results.
  • BlibFilter removes redundant spectra from a library.
  • BlibSearch searches a spectrum library for matches to query spectra, printing the results to a report file.
  • BlibToMS2 writes a library in a text MS2 file format.
  • LibToSqlite3 converts a BiblioSpec 1.0 library to a 2.0 library in sqlite3 format.

Download

BiblioSpec is freely available under the BSD license. Click here to go to the Download and build page.

Several reference libraries will be available soon for download.

More information

An overview of all file formats including a list of all the database search files that can be used to build libraries.




BlibBuild


Description

Creates a library of spectra with known peptide identifications. Typically, these identifications are done with a database search such as SEQUEST or Mascot, sometimes followed by an evaluation step such as percolator or Peptide Prophet. BlibBuild accepts files from a variety of database search programs, as well as some other spectral library formats. File formats are identified by file extension, which are given in the table below. In many cases, the peptide identification (peptide sequence, charge state and optional score) are in a separate file from the spectrum information. Unless noted, it is assumed that both files will be in the same directory.

Database searchPeptide ID file extensionSpectrum file extensionNotes
Comet / SEQUEST / Percolator.perc.xml (.sqt).cms2, .ms2Percolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory. The scores used are the q-values.
Peptide Prophet.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the probability scores.
Spectrum Mill.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the expectation scores.
OMSSA.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the expectation scores.
PEAKS DB

.pep.xml, .pep.XML, .pepXML

.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the confidence scores.
Morpheus

.pep.xml, .pep.XML, .pepXML

.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1). The scores used are the q-values.
X! Tandem.xtan.xml No separate spectrum file. The scores used are the expectation scores.
Mascot.dat No separate spectrum file. The scores used are the expectation scores (homology threshold).
Protein Pilot.group.xml No separate spectrum file. The scores used are the confidence scores.
ID Picker (Myrimatch).idpXML.mzXML, .mzMLThe name(s) of the spectrum file(s) are given in the .idpXML file. The scores used are the FDRs.
PRIDE.pride.xml No separate spectrum file.
MaxQuantmsms.txt No separate spectrum file. There need not be a . before 'msms'. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml or modification.xml can be placed in the same directory as the search results (or specified using the -x option). The scores used are the PEPs.
Proteome Discoverer.msf No separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified. The scores used are the Percolator q-values.
Scaffold.mzid.MGF, .mzXML, .mzML The scores used are the Peptide Probability scores.
ByOnic.mzid.MGF, .mzXML, .mzML The scores used are the Peptide AbsLogProb scores.
MSGF+.mzid, .pepXML.MGF, .mzXML, .mzMLThe scores used are the q-values.
MSefinal_fragment.csv There need not be a . before 'final_fragment'. The scores used are the scores in the 'score' column.
generic.ssl This generic format is provided for peptide identifications made by other means. See the file formats page for a description. The scores used are the scores in the 'score' column.

Usage

BlibBuild [options] <peptide id file>[+] <library name>

Input

  • <peptide id file> – A file containing peptide spectrum matches to be included in the library. The associated spectrum files should be in the same directory as the peptide id file but should not be given on the command line. See the above table for recognized formats. Multiple files may be listed together.
  • <library name> – The name of the library being created. An existing library may be overwriten or added to.

Output

A spectrum library in in sqlite3 format.

Options

  • -o   Overwrite existing library. Default append.
  • -s   Result file names from stdin. (e.g. ls *sqt | BlibBuild -s new.blib)
  • -C  <file size> Minimum file size required to use caching for .dat files. Specify units as B,K,G, or M. Default 800M.
  • -v   <level> Level of output to stderr (silent, error, status, warn). Default status.
  • -L   Write status and warning messages to log file.
  • -c   <cutoff score> Specify the cutoff score (0-1) below which peptide-spectrum matches will be excluded from the library.
  • -m   <size> SQLite memory cache size in Megs. Default 250M.
  • -l   <level> ZLib compression level (0-?). Default 3.
  • -i   <library_id> LSID library ID. Default uses file name.
  • -a   <authority> LSID authority. Default proteome.gs.washington.edu.
  • -x   <filename> Specify the path of XML modifications file for parsing MaxQuant files.
  • -d  [<filename>] Generate a set of SQLite3 commands that create an empty .blib file with fully annotated tables.



BlibFilter


Description

Create a library from an existing one such that the new library has only one spectrum for each peptide ion. The representative spectrum is chosen by taking the dot product of all pairs of spectra for a peptide and selecting the one with the highest average score.

Usage

BlibFilter [options] <redundant-library> <filtered-library>

Input

  • <initial library> – A library file with multiple spectra for all or some peptide ions.
  • <output library> – The name to be given to the resulting library.

Output

A library of spectra for the same peptides as the initial library, but with only one spectrum per peptide ion.

Options

  • -m [ --memory-cache ] <size> – SQLite memory cache size in Megs. Default 250M.
  • -n [ --min-peaks ] <num> – Only include spectra with at least this many peaks. Default 20.
  • -s [ --min-score ] <score> – Best spectrum must have at least this average score to be included. Default 0.
  • -p [ --parameter-file ] <file> – File containing search parameters. Command line values override file values.
  • -v [ --verbosity ] <level> – Control the level of output to stderr. (silent, error, status, warn, debug, detail, all) Default status.
  • -h [ --help ]– Print help message.



BlibSearch


Description

Search a spectrum library for matches to query spectra.

Usage

BlibSearch [options] <spectrum filename> <library filename>[+]

Input

  • <spectrum filename> – A file containing spectra to search. File formats accepted are .ms2, .cms2, .mzXML, .mzML, .MGF, and .wiff (Windows only).
  • <library name> – The library to be searched for matches to the query. Libraries may be filtered (the output of BlibFilter) or redundant (the output of BilbBuild). More than one library can be listed on the command line.

Output

Results are printed to a report file (tab-delimited text). The file may be named with the --report-file option or by default it is named after the spectrum file with the extension replaced with .report. A seprate report file is written for any decoy spectra searched. An optional sqlite .psm file may also be produced.

Options

  • -c [ --clear-precursor ] <true|false> – Remove the peaks in a X m/z window around the precursor from the query and library spectrum. Default true.
  • --topPeaksForSearch <num> – Use this many of the highest intensity peaks. Default 100.
  • -w [ --mz-window ] <size> – Compare query to library spectra with precursor m/z +/- size. Default 3.
  • -L [ --low-charge <charge> – ] Search only spectra with charge no less than this. Default 1.
  • -H [ --high-charge ] <charge> – Search only spectra with charge no higher than this. Default 5.
  • -m [ --report-matches ] <num> – Return this number of the best matches for each query. Use -1 to report all. Default 5.
  • --psm-result-file <name> – Return results in a .psm file of the given name. Default no .psm file.
  • -R [ --report-file ] <name> – Return results in report file of the given nam. Default is .report.
  • --preserve-order – Search spectra in the order they appear in the file. Default to search as sorted by precursor m/z.
  • -p [ --parameter-file ] <name> – File containing search parameters. Command line values override file values.
  • -v [ --verbosity ] <level> – Control the level of output to stderr. (silent, error, status, warn, debug, detail, all) Default status.
  • -h [ --help ] – Print help message.



BlibToMS2


Description

Write an MS2 file that contains all spectra in a library.

Usage

BlibToMS2 [options] <library>

Input

  • <library> – a spectrum library file, filtered or redundant.

Output

The spectra are printed to a file named <library>.ms2 in the MS2 format. The scan number is replaced with the library ID number. Two 'D' lines contain the peptide sequence with and without modifications.

Options

  • -f [ --file-name ] <ms2 file>– Use this name for the output MS2 file rather than the default name, <library>.ms2.
  • -m [ --mz-precision ] <num>– Write the peak m/z values with this many digits of precision. Default 2.
  • -i [ --intenisty-precision ] <num>– Write the peak intensity values with this many digits of precision. Default 1.
  • -p [ --parameter-file ] <file> – Specify parameters in a separate file. Command line vales override the file.
  • -v [ --verbose ] <silent|error|status|warn> – Set the verbosity level of the output to stderr. The default level is status.
  • -h [ --help ] – Print the help message.



LibToSqlite3


Description

Converts a BiblioSpec 1.0 library to a 2.0 library in sqlite3 format.

Usage

LibToSqlite3 <old version lib> <new lib name>

Input

  • <old version lib> – A BiblioSpec 1.0 library file.
  • <new lib name> – The name to be given to the converted library.

Output

A spectrum library in in sqlite3 format.