Table of Contents

guest
2023-03-24
BiblioSpec Spectral Library Tools
   BlibBuild
   BlibFilter
   BlibSearch
   BlibToMS2
   LibToSqlite3
   BiblioSpec Supported Formats

BiblioSpec Spectral Library Tools


BiblioSpec is a suite of software tools for creating and searching MS/MS peptide spectrum libraries.

New in version 2.0

BiblioSpec 2.0 stores spectrum libraries as sqlite3 files. Sqlite3 is a light-weight, open-source database format which can be read and manipulated with any sqlite3 tools in addition to BiblioSpec. For more information about the library format, see the file formats page. The new format is a departure from version 1.0 which uses a unique binary format. This means that tools and libraries from the two versions are not compatible. There is, however, a conversion tool for turning a version 1.0 library into a sqlite3 library.

BiblioSpec components

The BiblioSpec package contains the following programs:

  • BlibBuild creates a library of peptide MS/MS spectra from a variety of different database search results.
  • BlibFilter removes redundant spectra from a library.
  • BlibSearch searches a spectrum library for matches to query spectra, printing the results to a report file.
  • BlibToMS2 writes a library in a text MS2 file format.
  • LibToSqlite3 converts a BiblioSpec 1.0 library to a 2.0 library in sqlite3 format.

Download

BiblioSpec is freely available under the BSD license. Click here to go to the Download and build page.

Several reference libraries will be available soon for download.

More information

An overview of all file formats including a list of all the database search files that can be used to build libraries.




BlibBuild


Description

Creates a library of spectra with known peptide and/or small molecule identifications. Typically, these identifications are done with a database search such as SEQUEST or Mascot, sometimes followed by an evaluation step such as percolator or Peptide Prophet. BlibBuild accepts files from a variety of database search programs, as well as some other spectral library formats. File formats are identified by file extension, which are given in the table below. In many cases, the peptide identification (peptide sequence, charge state and optional score) are in a separate file from the spectrum information. Unless noted, it is assumed that both files will be in the same directory.

Database searchPeptide ID file extensionSpectrum file extension
*RAW includes vendor formats like RAW, WIFF, .D, etc.
Score UsedNotes
Generic SSL.ssl score columnA generic format for encoding spectrum library entries.
ByOnic.mzid.MGF, .mzXML, .mzMLAbsLogProb 
Comet/SEQUEST/Percolator.perc.xml, .sqt.cms2, .ms2, .mzXMLq-valuePercolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory.
DIA-NN.speclib noneNo separate spectrum file. In the current implementation, no score is imported from the library, so all spectra are imported.
IDPicker.idpXML.mzXML, .mzMLFDRThe name(s) of the spectrum file(s) are given in the .idpXML file.
MS Amanda.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*q-value 
MSFragger.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*q-value 
MSGF+.mzid, .pepXML.mzML, .mzXML, .MGF, RAW*expectation value 
Mascot.dat expectation valueNo separate spectrum file.
MaxQuant Andromedamsms.txt + evidence.txt + mqpar.xml + modifications.xml.mzML, .mzXML, .MGF, RAW*PEPIt is possible to use peaks embedded in the msms.txt, but external spectra files are preferred because the embedded peaks are charge deconvoluted. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml, modifications.local.xml, or modification.xml can be placed in the same directory as the search results (or specified using the -x option).
Morpheus.pep.xml, .pepXML.mzXML, .mzMLq-valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1).
OMSSA.pep.xml, .pepXML.mzXML, .mzMLexpectation valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
OpenSWATH.tsv m_score columnNo separate spectrum file.
PEAKS DB.pep.xml, .pepXML.mzXML, .mzMLconfidence scoreThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PLGS MSefinal_fragment.csv score columnThere need not be a . before 'final_fragment'..
PRIDE.pride.xml variousNo separate spectrum file.
PeptideProphet/iProphet.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*probability scoreThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PeptideShaker.mzid.MGFconfidence score 
Protein Pilot.group.xml confidence scoreNo separate spectrum file.
Protein Prospector.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*expectation value 
Proteome Discoverer.msf, .pdResult q-valueNo separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified.
Proxl XML.proxl.xml.mzML, .mzXML, .MGF, RAW*q-value 
Scaffold.mzid.MGF, .mzXML, .mzMLpeptide probability 
Spectronaut.csv noneSpectronaut Assay Library export. No separate spectrum file.
Spectrum Mill.pep.xml, .pepXML.mzXML, .mzMLexpectation valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
X! Tandem.xtan.xml expectation valueNo separate spectrum file.



Usage

BlibBuild [options] <peptide id file>[+] <library name>

Input

  • <peptide id file> – A file containing peptide spectrum matches to be included in the library. The associated spectrum files should be in the same directory as the peptide id file but should not be given on the command line. See the above table for recognized formats. Multiple files may be listed together.
  • <library name> – The name of the library being created. An existing library may be overwriten or added to.

Output

A spectrum library in in sqlite3 format.

Options

  • -o Overwrite existing library. Default append.
  • -S <filename> Read from file as though it were stdin.
  • -s Result file names from stdin. e.g. ls *sqt | BlibBuild -s new.blib.
  • -u Ignore peptides except those with the unmodified sequences from stdin.
  • -U Ignore peptides except those with the modified sequences from stdin.
  • -H Use more than one decimal place when describing mass modifications.
  • -C <file size> Minimum file size required to use caching for .dat files. Specifiy units as B,K,G or M. Default 800M.
  • -c <cutoff> Score threshold (0-1) for PSMs to be included in library. Higher threshold is more exclusive.
  • -v <level> Level of output to stderr (silent, error, status, warn). Default status.
  • -L Write status and warning messages to log file.
  • -m <size> SQLite memory cache size in Megs. Default 250M.
  • -l <level> ZLib compression level (0-?). Default 3.
  • -i <library_id> LSID library ID. Default uses file name.
  • -a <authority> LSID authority. Default proteome.gs.washington.edu.
  • -x <filename> Specify the path of XML modifications file for parsing MaxQuant files.
  • -p <filename> Specify the path of XML parameters file for parsing MaxQuant files.
  • -P <float> Specify pusher interval for Waters final_fragment.csv files.
  • -d [<filename>] Document the .blib format by writing SQLite commands to a file, or stdout if no filename is given.
  • -E Prefer reading peaks from embedded spectra (currently only affects MaxQuant msms.txt)
  • -A Output messages noting ambiguously matched spectra (spectra matched to multiple peptides)
  • -K Keep ambiguously matched spectra

 




BlibFilter


Description

Create a library from an existing one such that the new library has only one spectrum for each peptide ion. The representative spectrum is chosen by taking the dot product of all pairs of spectra for a peptide and selecting the one with the highest average score.

Usage

BlibFilter [options] <redundant-library> <filtered-library>

Input

  • <initial library> – A library file with multiple spectra for all or some peptide ions.
  • <output library> – The name to be given to the resulting library.

Output

A library of spectra for the same peptides as the initial library, but with only one spectrum per peptide ion.

Options

  • -m [ --memory-cache ] <size> – SQLite memory cache size in Megs. Default 250M.
  • -n [ --min-peaks ] <num> – Only include spectra with at least this many peaks. Default 20.
  • -s [ --min-score ] <score> – Best spectrum must have at least this average score to be included. Default 0.
  • -p [ --parameter-file ] <file> – File containing search parameters. Command line values override file values.
  • -v [ --verbosity ] <level> – Control the level of output to stderr. (silent, error, status, warn, debug, detail, all) Default status.
  • -h [ --help ]– Print help message.



BlibSearch


Description

Search a spectrum library for matches to query spectra.

Usage

BlibSearch [options] <spectrum filename> <library filename>[+]

Input

  • <spectrum filename> – A file containing spectra to search. File formats accepted are .ms2, .cms2, .mzXML, .mzML, .MGF, and .wiff (Windows only).
  • <library name> – The library to be searched for matches to the query. Libraries may be filtered (the output of BlibFilter) or redundant (the output of BilbBuild). More than one library can be listed on the command line.

Output

Results are printed to a report file (tab-delimited text). The file may be named with the --report-file option or by default it is named after the spectrum file with the extension replaced with .report. A seprate report file is written for any decoy spectra searched. An optional sqlite .psm file may also be produced.

Options

  • -c [ --clear-precursor ] <true|false> – Remove the peaks in a X m/z window around the precursor from the query and library spectrum. Default true.
  • --topPeaksForSearch <num> – Use this many of the highest intensity peaks. Default 100.
  • -w [ --mz-window ] <size> – Compare query to library spectra with precursor m/z +/- size. Default 3.
  • -L [ --low-charge <charge> – ] Search only spectra with charge no less than this. Default 1.
  • -H [ --high-charge ] <charge> – Search only spectra with charge no higher than this. Default 5.
  • -m [ --report-matches ] <num> – Return this number of the best matches for each query. Use -1 to report all. Default 5.
  • --psm-result-file <name> – Return results in a .psm file of the given name. Default no .psm file.
  • -R [ --report-file ] <name> – Return results in report file of the given nam. Default is .report.
  • --preserve-order – Search spectra in the order they appear in the file. Default to search as sorted by precursor m/z.
  • -p [ --parameter-file ] <name> – File containing search parameters. Command line values override file values.
  • -v [ --verbosity ] <level> – Control the level of output to stderr. (silent, error, status, warn, debug, detail, all) Default status.
  • -h [ --help ] – Print help message.



BlibToMS2


Description

Write an MS2 file that contains all spectra in a library.

Usage

BlibToMS2 [options] <library>

Input

  • <library> – a spectrum library file, filtered or redundant.

Output

The spectra are printed to a file named <library>.ms2 in the MS2 format. The scan number is replaced with the library ID number. Two 'D' lines contain the peptide sequence with and without modifications.

Options

  • -f [ --file-name ] <ms2 file>– Use this name for the output MS2 file rather than the default name, <library>.ms2.
  • -m [ --mz-precision ] <num>– Write the peak m/z values with this many digits of precision. Default 2.
  • -i [ --intenisty-precision ] <num>– Write the peak intensity values with this many digits of precision. Default 1.
  • -p [ --parameter-file ] <file> – Specify parameters in a separate file. Command line vales override the file.
  • -v [ --verbose ] <silent|error|status|warn> – Set the verbosity level of the output to stderr. The default level is status.
  • -h [ --help ] – Print the help message.



LibToSqlite3


Description

Converts a BiblioSpec 1.0 library to a 2.0 library in sqlite3 format.

Usage

LibToSqlite3 <old version lib> <new lib name>

Input

  • <old version lib> – A BiblioSpec 1.0 library file.
  • <new lib name> – The name to be given to the converted library.

Output

A spectrum library in in sqlite3 format.




BiblioSpec Supported Formats


Database searchPeptide ID file extensionSpectrum file extension
*RAW includes vendor formats like RAW, WIFF, .D, etc.
Score UsedNotes
Generic SSL.ssl score columnA generic format for encoding spectrum library entries.
ByOnic.mzid.MGF, .mzXML, .mzMLAbsLogProb 
Comet/SEQUEST/Percolator.perc.xml, .sqt.cms2, .ms2, .mzXMLq-valuePercolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory.
DIA-NN.speclib noneNo separate spectrum file. In the current implementation, no score is imported from the library, so all spectra are imported.
IDPicker.idpXML.mzXML, .mzMLFDRThe name(s) of the spectrum file(s) are given in the .idpXML file.
MS Amanda.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*q-value 
MSFragger.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*q-value 
MSGF+.mzid, .pepXML.mzML, .mzXML, .MGF, RAW*expectation value 
Mascot.dat expectation valueNo separate spectrum file.
MaxQuant Andromedamsms.txt + evidence.txt + mqpar.xml + modifications.xml.mzML, .mzXML, .MGF, RAW*PEPIt is possible to use peaks embedded in the msms.txt, but external spectra files are preferred because the embedded peaks are charge deconvoluted. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml, modifications.local.xml, or modification.xml can be placed in the same directory as the search results (or specified using the -x option).
Morpheus.pep.xml, .pepXML.mzXML, .mzMLq-valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1).
OMSSA.pep.xml, .pepXML.mzXML, .mzMLexpectation valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
OpenSWATH.tsv m_score columnNo separate spectrum file.
PEAKS DB.pep.xml, .pepXML.mzXML, .mzMLconfidence scoreThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PLGS MSefinal_fragment.csv score columnThere need not be a . before 'final_fragment'..
PRIDE.pride.xml variousNo separate spectrum file.
PeptideProphet/iProphet.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*probability scoreThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PeptideShaker.mzid.MGFconfidence score 
Protein Pilot.group.xml confidence scoreNo separate spectrum file.
Protein Prospector.pep.xml, .pepXML.mzML, .mzXML, .MGF, RAW*expectation value 
Proteome Discoverer.msf, .pdResult q-valueNo separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified.
Proxl XML.proxl.xml.mzML, .mzXML, .MGF, RAW*q-value 
Scaffold.mzid.MGF, .mzXML, .mzMLpeptide probability 
Spectronaut.csv noneSpectronaut Assay Library export. No separate spectrum file.
Spectrum Mill.pep.xml, .pepXML.mzXML, .mzMLexpectation valueThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
X! Tandem.xtan.xml expectation valueNo separate spectrum file.