Table of Contents

guest
2019-05-26
BiblioSpec Spectral Library Tools
   BlibBuild
   BlibFilter
   BlibSearch
   BlibToMS2
   LibToSqlite3
BiblioSpec input and output file formats
   Example .ssl file
   Example .ms2 file
   Example .ssl file for small molecules
Download and build

BiblioSpec Spectral Library Tools


BiblioSpec is a suite of software tools for creating and searching MS/MS peptide spectrum libraries.

New in version 2.0

BiblioSpec 2.0 stores spectrum libraries as sqlite3 files. Sqlite3 is a light-weight, open-source database format which can be read and manipulated with any sqlite3 tools in addition to BiblioSpec. For more information about the library format, see the file formats page. The new format is a departure from version 1.0 which uses a unique binary format. This means that tools and libraries from the two versions are not compatible. There is, however, a conversion tool for turning a version 1.0 library into a sqlite3 library.

BiblioSpec components

The BiblioSpec package contains the following programs:

  • BlibBuild creates a library of peptide MS/MS spectra from a variety of different database search results.
  • BlibFilter removes redundant spectra from a library.
  • BlibSearch searches a spectrum library for matches to query spectra, printing the results to a report file.
  • BlibToMS2 writes a library in a text MS2 file format.
  • LibToSqlite3 converts a BiblioSpec 1.0 library to a 2.0 library in sqlite3 format.

Download

BiblioSpec is freely available under the BSD license. Click here to go to the Download and build page.

Several reference libraries will be available soon for download.

More information

An overview of all file formats including a list of all the database search files that can be used to build libraries.




BlibBuild


Description

Creates a library of spectra with known peptide identifications. Typically, these identifications are done with a database search such as SEQUEST or Mascot, sometimes followed by an evaluation step such as percolator or Peptide Prophet. BlibBuild accepts files from a variety of database search programs, as well as some other spectral library formats. File formats are identified by file extension, which are given in the table below. In many cases, the peptide identification (peptide sequence, charge state and optional score) are in a separate file from the spectrum information. Unless noted, it is assumed that both files will be in the same directory.

generic.ssl This generic format is provided for peptide identifications made by other means. See the file formats page for a description. The scores used are the scores in the 'score' column.

Database searchPeptide ID file extensionSpectrum file extensionNotes
Comet / SEQUEST / Percolator.perc.xml (.sqt).cms2, .ms2Percolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory. The scores used are the q-values.
Peptide Prophet.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the probability scores.
Spectrum Mill.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the expectation scores.
OMSSA.pep.xml, .pep.XML, .pepXML.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the expectation scores.
PEAKS DB

.pep.xml, .pep.XML, .pepXML

.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. The scores used are the confidence scores.
Morpheus

.pep.xml, .pep.XML, .pepXML

.mzXML, .mzMLThe names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1). The scores used are the q-values.
X! Tandem.xtan.xml No separate spectrum file. The scores used are the expectation scores.
Mascot.dat No separate spectrum file. The scores used are the expectation scores (homology threshold).
Protein Pilot.group.xml No separate spectrum file. The scores used are the confidence scores.
ID Picker (Myrimatch).idpXML.mzXML, .mzMLThe name(s) of the spectrum file(s) are given in the .idpXML file. The scores used are the FDRs.
PRIDE.pride.xml No separate spectrum file.
MaxQuantmsms.txt No separate spectrum file. There need not be a . before 'msms'. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml or modification.xml can be placed in the same directory as the search results (or specified using the -x option). The scores used are the PEPs.
Proteome Discoverer.msf No separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified. The scores used are the Percolator q-values.
Scaffold.mzid.MGF, .mzXML, .mzML The scores used are the Peptide Probability scores.
ByOnic.mzid.MGF, .mzXML, .mzML The scores used are the Peptide AbsLogProb scores.
MSGF+.mzid, .pepXML.MGF, .mzXML, .mzMLThe scores used are the q-values.
MSefinal_fragment.csv There need not be a . before 'final_fragment'. The scores used are the scores in the 'score' column.
OpenSWATH.tsv No separate spectrum file. The scores used are in the 'm_score' column.

Usage

BlibBuild [options] <peptide id file>[+] <library name>

Input

  • <peptide id file> – A file containing peptide spectrum matches to be included in the library. The associated spectrum files should be in the same directory as the peptide id file but should not be given on the command line. See the above table for recognized formats. Multiple files may be listed together.
  • <library name> – The name of the library being created. An existing library may be overwriten or added to.

Output

A spectrum library in in sqlite3 format.

Options

  • -o Overwrite existing library. Default append.
  • -S <filename> Read from file as though it were stdin.
  • -s Result file names from stdin. e.g. ls *sqt | BlibBuild -s new.blib.
  • -u Ignore peptides except those with the unmodified sequences from stdin.
  • -U Ignore peptides except those with the modified sequences from stdin.
  • -H Use more than one decimal place when describing mass modifications.
  • -C <file size> Minimum file size required to use caching for .dat files. Specifiy units as B,K,G or M. Default 800M.
  • -c <cutoff> Score threshold (0-1) for PSMs to be included in library. Higher threshold is more exclusive.
  • -v <level> Level of output to stderr (silent, error, status, warn). Default status.
  • -L Write status and warning messages to log file.
  • -m <size> SQLite memory cache size in Megs. Default 250M.
  • -l <level> ZLib compression level (0-?). Default 3.
  • -i <library_id> LSID library ID. Default uses file name.
  • -a <authority> LSID authority. Default proteome.gs.washington.edu.
  • -x <filename> Specify the path of XML modifications file for parsing MaxQuant files.
  • -p <filename> Specify the path of XML parameters file for parsing MaxQuant files.
  • -P <float> Specify pusher interval for Waters final_fragment.csv files.
  • -d [<filename>] Document the .blib format by writing SQLite commands to a file, or stdout if no filename is given.
  • -E Prefer reading peaks from embedded spectra (currently only affects MaxQuant msms.txt)

 




BlibFilter


Description

Create a library from an existing one such that the new library has only one spectrum for each peptide ion. The representative spectrum is chosen by taking the dot product of all pairs of spectra for a peptide and selecting the one with the highest average score.

Usage

BlibFilter [options] <redundant-library> <filtered-library>

Input

  • <initial library> – A library file with multiple spectra for all or some peptide ions.
  • <output library> – The name to be given to the resulting library.

Output

A library of spectra for the same peptides as the initial library, but with only one spectrum per peptide ion.

Options

  • -m [ --memory-cache ] <size> – SQLite memory cache size in Megs. Default 250M.
  • -n [ --min-peaks ] <num> – Only include spectra with at least this many peaks. Default 20.
  • -s [ --min-score ] <score> – Best spectrum must have at least this average score to be included. Default 0.
  • -p [ --parameter-file ] <file> – File containing search parameters. Command line values override file values.
  • -v [ --verbosity ] <level> – Control the level of output to stderr. (silent, error, status, warn, debug, detail, all) Default status.
  • -h [ --help ]– Print help message.



BlibSearch


Description

Search a spectrum library for matches to query spectra.

Usage

BlibSearch [options] <spectrum filename> <library filename>[+]

Input

  • <spectrum filename> – A file containing spectra to search. File formats accepted are .ms2, .cms2, .mzXML, .mzML, .MGF, and .wiff (Windows only).
  • <library name> – The library to be searched for matches to the query. Libraries may be filtered (the output of BlibFilter) or redundant (the output of BilbBuild). More than one library can be listed on the command line.

Output

Results are printed to a report file (tab-delimited text). The file may be named with the --report-file option or by default it is named after the spectrum file with the extension replaced with .report. A seprate report file is written for any decoy spectra searched. An optional sqlite .psm file may also be produced.

Options

  • -c [ --clear-precursor ] <true|false> – Remove the peaks in a X m/z window around the precursor from the query and library spectrum. Default true.
  • --topPeaksForSearch <num> – Use this many of the highest intensity peaks. Default 100.
  • -w [ --mz-window ] <size> – Compare query to library spectra with precursor m/z +/- size. Default 3.
  • -L [ --low-charge <charge> – ] Search only spectra with charge no less than this. Default 1.
  • -H [ --high-charge ] <charge> – Search only spectra with charge no higher than this. Default 5.
  • -m [ --report-matches ] <num> – Return this number of the best matches for each query. Use -1 to report all. Default 5.
  • --psm-result-file <name> – Return results in a .psm file of the given name. Default no .psm file.
  • -R [ --report-file ] <name> – Return results in report file of the given nam. Default is .report.
  • --preserve-order – Search spectra in the order they appear in the file. Default to search as sorted by precursor m/z.
  • -p [ --parameter-file ] <name> – File containing search parameters. Command line values override file values.
  • -v [ --verbosity ] <level> – Control the level of output to stderr. (silent, error, status, warn, debug, detail, all) Default status.
  • -h [ --help ] – Print help message.



BlibToMS2


Description

Write an MS2 file that contains all spectra in a library.

Usage

BlibToMS2 [options] <library>

Input

  • <library> – a spectrum library file, filtered or redundant.

Output

The spectra are printed to a file named <library>.ms2 in the MS2 format. The scan number is replaced with the library ID number. Two 'D' lines contain the peptide sequence with and without modifications.

Options

  • -f [ --file-name ] <ms2 file>– Use this name for the output MS2 file rather than the default name, <library>.ms2.
  • -m [ --mz-precision ] <num>– Write the peak m/z values with this many digits of precision. Default 2.
  • -i [ --intenisty-precision ] <num>– Write the peak intensity values with this many digits of precision. Default 1.
  • -p [ --parameter-file ] <file> – Specify parameters in a separate file. Command line vales override the file.
  • -v [ --verbose ] <silent|error|status|warn> – Set the verbosity level of the output to stderr. The default level is status.
  • -h [ --help ] – Print the help message.



LibToSqlite3


Description

Converts a BiblioSpec 1.0 library to a 2.0 library in sqlite3 format.

Usage

LibToSqlite3 <old version lib> <new lib name>

Input

  • <old version lib> – A BiblioSpec 1.0 library file.
  • <new lib name> – The name to be given to the converted library.

Output

A spectrum library in in sqlite3 format.




BiblioSpec input and output file formats


BiblioSpec makes use of several file formats for input and output. Below are descriptions of these along with links to additional information.

Database search result files

In most cases libraries are built from database search result files. Supported formats are listed on the BlibBuild page

BlibBuild .ssl file

For peptide or small molecule identifications that do not come from one of the supported database searches, BiblioSpec supports a generic tab-delimited text file format refered to as ssl (spectrum sequence list). Here is a small example file. An ssl file must end with the '.ssl' extension and have a header line with the following column names in it (the score-type, score, and retention-time columns are optional):

file       scan    charge  sequence        score-type      score   retention-time

additional columns for small molecule use may be included (the sequence column should be omitted for small molecule libraries - here is a small example file):

adduct     chemicalformula moleculename    inchikey        otherkeys

Each of the following lines contains information for one spectrum. The first column contains a full or relative path to a file containing the spectrum. The second column has an id for that spectrum, typically a scan number or index number. The third column is the charge state of the spectrum. The fourth column contains the peptide sequence, with the addition of any modifications given as a mass shift (the difference between the modified and unmodified residue) following the modified residues. For example,

TASEFDC[+57.0]SAIO[+16.0]AQDK

Peptides with n-terminal modifications should have these mass shift follow the first residue.

The score-type column can be any of the following:

UNKNOWN

PERCOLATOR QVALUE

PEPTIDE PROPHET SOMETHING

SPECTRUM MILL

IDPICKER FDR

MASCOT IONS SCORE

TANDEM EXPECTATION VALUE

PROTEIN PILOT CONFIDENCE

SCAFFOLD SOMETHING

WATERS MSE PEPTIDE SCORE

OMSSA EXPECTATION SCORE

PROTEIN PROSPECTOR EXPECTATION SCORE

SEQUEST XCORR

MAXQUANT SCORE

 

and the score column is a floating point value representing the spectrum's score of that type. The retention time column can be used to specify retention times in minutes; otherwise the values from the spectrum file will be used. Scores fall into three categories: probability that identification is correct, probability that identification is incorrect, or not a probability score. This information can be found in the ScoreTypes table.

Library files

BiblioSpec library files are in the sqlite3 format, usually with a ".blib" filename extension. Each library is a small database that you can search and manipulate with standard SQL commands using, for example, the sqlite3 command line tools or SQLite Expert Personal.

BiblioSpec does not require that you know any SQL, but should you be interested in using these files outside of the BiblioSpec context theexample parameter file.




Example .ssl file


file   scan    charge  sequence
demo.ms2        8       3       VGAGAPVYLAAVLEYLAAEVLELAGNAAR
demo.ms2        1806    2       LAESITIEQGK
demo.ms2        2572    2       ELAEDGC[+57.0]SGVEVR
demo.ms2        3088    2       TTAGAVEATSEITEGK
demo.ms2        3266    2       DC[+57.0]EEVGADSNEGGEEEGEEC[+57.0]
demo.ms2        9734    3       IWELEFPEEAADFQQQPVNAQ[-17.0]PQN
demo.ms2        20919   3       VHINIVVIGHVDSGK
../elsewhere/spec.mzXML 00497   2       LKEPAQNTADNAK
../elsewhere/spec.mzXML 00680   2       ALEGPGPGEDAAHSENNPPR
../elsewhere/spec.mzXML 00965   2       FFSHEAEQK
../elsewhere/spec.mzXML 01114   2       C[+57.0]GPSQPLK
../elsewhere/spec.mzXML 01382   2       AVHVQVTDAEAGK



Example .ms2 file


H      CreationDate    Mon Apr 12 15:12:14 2010
H       Extractor       BlibToMs2
H       Library /home/me/research/search/demo.blib
S       1       1       636.34
Z       2       1253.36
D       seq     FKNGFQTGSASK
D       modified seq    FKNGFQTGSASK
187.40  12.5
193.10  19.5
194.30  13.7
198.30  29.8
199.10  12.2
208.30  23.1
208.90  11.4
210.30  11.8
213.00  3.3
214.50  4.3
216.10  32.8
219.10  11.2
221.00  14.3
222.10  64.0
225.10  16.6
226.00  31.6
228.30  7.2
229.10  8.5
230.50  58.2
231.20  236.1
232.20  75.8
233.60  2.4
234.20  51.4
235.10  5.6
236.30  30.2
239.70  14.4
241.30  34.8
242.30  14.2
244.30  9.0
S       2       2       745.3
Z       2       1471.7
D       seq     NFLETVELQVGLK
D       modified seq    NFLETVELQVGLK
1224.60 7.9
1228.70 468.9
1230.40 658.5
1231.50 144.2
1240.00 11.7
1242.70 45.9
1243.80 16.8
1253.80 17.2
1255.00 7.9
1255.80 14.4
1259.70 15.5
1273.10 5.9
1275.90 10.5
1277.10 7.8
1283.30 4.7
1296.50 19.2
1299.50 13.0
1307.40 6.1
1308.40 21.3
1313.00 1.7
1313.80 5.5
1315.40 3.6
1316.80 22.3
1323.90 1.5
1325.50 40.5
1326.30 75.9
S       3       3       732.1
Z       2       1444.7
D       seq     NEVSAMPTLLLFK
D       modified seq    NEVSAMPTLLLFK
209.00  62.5
210.30  12.8
216.00  87.0
220.10  58.0
224.90  4.9
226.10  418.2
227.00  68.3
227.90  46.7
229.20  13.3
231.10  12.7
238.10  209.1
239.20  15.0
244.10  953.8
245.20  90.0
245.90  20.4
252.30  8.8
255.30  38.8
260.20  9.4
262.10  35.0
270.00  10.9
275.80  21.8
277.40  6.3
279.10  12.7
280.20  49.8



Example .ssl file for small molecules


Note that these are tab separated fields, and the otherkeys field itself is tab separated.

file       scan    charge  adduct  inchikey        chemicalformula moleculename    otherkeys
dexcaf_051017.mzML      01369   -1      [M-H]   ZXPLRDFHBYIQOX-BTBVOZEKSA-N     C24H44O21N0     Glc04Reduced
dexcaf_051017.mzML      01639   -1      [M-H]   NBVGBCYERZIRIP-JAMOUWTMSA-N     C30H54O26N0     Glc05Reduced
dexcaf_051017.mzML      01855   -1      [M-H]   PNHJKLJIDNHXFR-ZGJYWSOBSA-N     C36H64O31N0     Glc06Reduced
dexcaf_051017.mzML      02029   -1      [M-H]   NVKJDLBVRSXYRE-BMFDHOHESA-N     C42H74O36N0     Glc07Reduced
dexcaf_051017.mzML      02179   -1      [M-H]   YMRGEPQWJZHXFF-MGQBKJSVSA-N     C48H84O41N0     Glc08Reduced
dexcaf_051017.mzML      01079   -1      [M-H]   RYYVLZVUVIJVGH-UHFFFAOYSA-N     C8H10N4O2       Caffeine        "InChI:1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3    HMDB:01847      CAS:58-08-2     SMILES:Cn1cnc2n(C)c(=O)n(C)c(=O)c12"



Download and build


Download

The BiblioSpec source code is available as part of the ProteoWizard project on GitHub:

ProteoWizard git repository

A Mascot Parser license should also be requested if you intend to parse Mascot .dat files; once you receive the download instructions:

  • On Windows, the libraries should be installed to C:\Program Files (x86)\Matrix Science\Mascot Parser (32-bit) or C:\Program Files\Matrix Science\Mascot Parser (64-bit)
  • On Linux, the libraries should be installed to /usr/local/msparser/gnu

You will need Visual Studio 2013 to build.

Build

  1. Download (or clone with git) the trunk/pwiz directory from the ProteoWizard git repository.
  2. Open a command prompt with administrative rights (Windows), or a terminal (Linux) and cd to the root of the downloaded source files (e.g. %HOMEDIR%\Documents\pwiz)
  3. Optional: Run clean.bat (Windows) or clean.sh (Linux) to remove files from a previous build.
  4. Run either quickbuild.bat (Windows) or quickbuild.sh (Linux) with the arguments: -j<number of threads to build with> --hash optimization=space address-model=<32|64> pwiz_tools/BiblioSpec > path-to-build-log-file. For example, a 64-bit Windows build using four threads may be done with the command "quickbuild.bat -j4 --hash optimization=space address-model=64 pwiz_tools\BiblioSpec > build.log". Note that "optimization=space" may be omitted for a debug build.

The resulting build will be located in the build-<os>-<architecture>/pwiz_tools/BiblioSpec directory.

Pre-compiled binaries

64 bit Windows binaries (BlibBuild.exe, BlibFilter.exe and BlibToMs2.exe) are available from the ProteoWizard automated build and test system. (If you encounter a "Log in to TeamCity" page, just click the "log in as guest" link to proceed.) Note that this download does not provide a proper installer: it's intended for updating existing installations only. It likely will not function if unzipped to a bare directory.