Table of Contents |
guest 2023-03-25 |
BiblioSpec makes use of several file formats for input and output. Below are descriptions of these along with links to additional information.
In most cases libraries are built from database search result files. Supported formats are listed on the BlibBuild page.
For peptide or small molecule identifications that do not come from one of the supported database searches, BiblioSpec supports a generic tab-delimited text file format refered to as ssl (spectrum sequence list). Here is a small example file. An ssl file must end with the '.ssl' extension and have a header line with the following column names in it (the score-type, score, and retention-time columns are optional):
file scan charge sequence score-type score retention-time
additional columns for small molecule use may be included (the sequence column should be omitted for small molecule libraries - here is a small example file):
adduct chemicalformula moleculename inchikey otherkeys
Each of the following lines contains information for one spectrum. The first column contains a full or relative path to a file containing the spectrum (e.g. vendor formats like .raw, .wiff, etc. or .ms2, .mzML, .mzXML, .mgf). In an .ms2 file there are four types of lines. Lines beginning with 'H' are header lines and contain information about how the data was collected as well as comments. They appear at the beginning of the file. Lines beginning with 'S' are followed by the scan number and the precursor m/z. Lines beginning with 'Z' give the charge state followed by the mass of the ion at that charge state. Lines beginning with 'D' contain information relevant to the preceeding charge state. BlibToMs2's output will include D-lines with the sequence and modified sequence. The file is arranged with these S, Z and D lines for one spectrum followed by a peak list: a pair of values giving each peaks m/z and intensity. Here is an example file.
The second column has an id for that spectrum, typically a scan number or index number. The third column is the charge state of the spectrum. The fourth column contains the peptide sequence, with the addition of any modifications given as a mass shift (the difference between the modified and unmodified residue) following the modified residues. For example,
TASEFDC[+57.0]SAIO[+16.0]AQDK
Peptides with n-terminal modifications should have these mass shift follow the first residue.
The score-type column can be any of the following:
UNKNOWN |
PERCOLATOR QVALUE |
PEPTIDE PROPHET SOMETHING |
SPECTRUM MILL |
IDPICKER FDR |
MASCOT IONS SCORE |
TANDEM EXPECTATION VALUE |
PROTEIN PILOT CONFIDENCE |
SCAFFOLD SOMETHING |
WATERS MSE PEPTIDE SCORE |
OMSSA EXPECTATION SCORE |
PROTEIN PROSPECTOR EXPECTATION SCORE |
SEQUEST XCORR |
MAXQUANT SCORE |
and the score column is a floating point value representing the spectrum's score of that type. The retention time column can be used to specify retention times in minutes; otherwise the values from the spectrum file will be used. Scores fall into three categories: probability that identification is correct, probability that identification is incorrect, or not a probability score. This information can be found in the ScoreTypes table.
Library files
BiblioSpec library files are in the sqlite3 format, usually with a ".blib" filename extension. Each library is a small database that you can search and manipulate with standard SQL commands using, for example, the sqlite3 command line tools or SQLite Expert Personal.
Details on the BiblioSpec SQLite schema can be found here.
file scan charge sequence demo.ms2 8 3 VGAGAPVYLAAVLEYLAAEVLELAGNAAR demo.ms2 1806 2 LAESITIEQGK demo.ms2 2572 2 ELAEDGC[+57.0]SGVEVR demo.ms2 3088 2 TTAGAVEATSEITEGK demo.ms2 3266 2 DC[+57.0]EEVGADSNEGGEEEGEEC[+57.0] demo.ms2 9734 3 IWELEFPEEAADFQQQPVNAQ[-17.0]PQN demo.ms2 20919 3 VHINIVVIGHVDSGK ../elsewhere/spec.mzXML 00497 2 LKEPAQNTADNAK ../elsewhere/spec.mzXML 00680 2 ALEGPGPGEDAAHSENNPPR ../elsewhere/spec.mzXML 00965 2 FFSHEAEQK ../elsewhere/spec.mzXML 01114 2 C[+57.0]GPSQPLK ../elsewhere/spec.mzXML 01382 2 AVHVQVTDAEAGK
H CreationDate Mon Apr 12 15:12:14 2010 H Extractor BlibToMs2 H Library /home/me/research/search/demo.blib S 1 1 636.34 Z 2 1253.36 D seq FKNGFQTGSASK D modified seq FKNGFQTGSASK 187.40 12.5 193.10 19.5 194.30 13.7 198.30 29.8 199.10 12.2 208.30 23.1 208.90 11.4 210.30 11.8 213.00 3.3 214.50 4.3 216.10 32.8 219.10 11.2 221.00 14.3 222.10 64.0 225.10 16.6 226.00 31.6 228.30 7.2 229.10 8.5 230.50 58.2 231.20 236.1 232.20 75.8 233.60 2.4 234.20 51.4 235.10 5.6 236.30 30.2 239.70 14.4 241.30 34.8 242.30 14.2 244.30 9.0 S 2 2 745.3 Z 2 1471.7 D seq NFLETVELQVGLK D modified seq NFLETVELQVGLK 1224.60 7.9 1228.70 468.9 1230.40 658.5 1231.50 144.2 1240.00 11.7 1242.70 45.9 1243.80 16.8 1253.80 17.2 1255.00 7.9 1255.80 14.4 1259.70 15.5 1273.10 5.9 1275.90 10.5 1277.10 7.8 1283.30 4.7 1296.50 19.2 1299.50 13.0 1307.40 6.1 1308.40 21.3 1313.00 1.7 1313.80 5.5 1315.40 3.6 1316.80 22.3 1323.90 1.5 1325.50 40.5 1326.30 75.9 S 3 3 732.1 Z 2 1444.7 D seq NEVSAMPTLLLFK D modified seq NEVSAMPTLLLFK 209.00 62.5 210.30 12.8 216.00 87.0 220.10 58.0 224.90 4.9 226.10 418.2 227.00 68.3 227.90 46.7 229.20 13.3 231.10 12.7 238.10 209.1 239.20 15.0 244.10 953.8 245.20 90.0 245.90 20.4 252.30 8.8 255.30 38.8 260.20 9.4 262.10 35.0 270.00 10.9 275.80 21.8 277.40 6.3 279.10 12.7 280.20 49.8
Note that these are tab separated fields, and the otherkeys field itself is tab separated.
file scan charge adduct inchikey chemicalformula moleculename otherkeys dexcaf_051017.mzML 01369 -1 [M-H] ZXPLRDFHBYIQOX-BTBVOZEKSA-N C24H44O21N0 Glc04Reduced dexcaf_051017.mzML 01639 -1 [M-H] NBVGBCYERZIRIP-JAMOUWTMSA-N C30H54O26N0 Glc05Reduced dexcaf_051017.mzML 01855 -1 [M-H] PNHJKLJIDNHXFR-ZGJYWSOBSA-N C36H64O31N0 Glc06Reduced dexcaf_051017.mzML 02029 -1 [M-H] NVKJDLBVRSXYRE-BMFDHOHESA-N C42H74O36N0 Glc07Reduced dexcaf_051017.mzML 02179 -1 [M-H] YMRGEPQWJZHXFF-MGQBKJSVSA-N C48H84O41N0 Glc08Reduced dexcaf_051017.mzML 01079 -1 [M-H] RYYVLZVUVIJVGH-UHFFFAOYSA-N C8H10N4O2 Caffeine "InChI:1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3 HMDB:01847 CAS:58-08-2 SMILES:Cn1cnc2n(C)c(=O)n(C)c(=O)c12"