BiblioSpec input and output file formats

BiblioSpec

BiblioSpec makes use of several file formats for input and output. Below are descriptions of these along with links to additional information.

Database search result files

In most cases libraries are built from database search result files. Supported formats are listed on the BlibBuild page

BlibBuild .ssl file

For peptide or small molecule identifications that do not come from one of the supported database searches, BiblioSpec supports a generic tab-delimited text file format refered to as ssl (spectrum sequence list). Here is a small example file. An ssl file must end with the '.ssl' extension and have a header line with the following column names in it (the score-type, score, and retention-time columns are optional):

file       scan    charge  sequence        score-type      score   retention-time

additional columns for small molecule use may be included (the sequence column should be omitted for small molecule libraries - here is a small example file):

adduct     chemicalformula moleculename    inchikey        otherkeys

Each of the following lines contains information for one spectrum. The first column contains a full or relative path to a file containing the spectrum. The second column has an id for that spectrum, typically a scan number or index number. The third column is the charge state of the spectrum. The fourth column contains the peptide sequence, with the addition of any modifications given as a mass shift (the difference between the modified and unmodified residue) following the modified residues. For example,

TASEFDC[+57.0]SAIO[+16.0]AQDK

Peptides with n-terminal modifications should have these mass shift follow the first residue.

The score-type column can be any of the following:

UNKNOWN

PERCOLATOR QVALUE

PEPTIDE PROPHET SOMETHING

SPECTRUM MILL

IDPICKER FDR

MASCOT IONS SCORE

TANDEM EXPECTATION VALUE

PROTEIN PILOT CONFIDENCE

SCAFFOLD SOMETHING

WATERS MSE PEPTIDE SCORE

OMSSA EXPECTATION SCORE

PROTEIN PROSPECTOR EXPECTATION SCORE

SEQUEST XCORR

MAXQUANT SCORE

 

and the score column is a floating point value representing the spectrum's score of that type. The retention time column can be used to specify retention times in minutes; otherwise the values from the spectrum file will be used. Scores fall into three categories: probability that identification is correct, probability that identification is incorrect, or not a probability score. This information can be found in the ScoreTypes table.

Library files

BiblioSpec library files are in the sqlite3 format, usually with a ".blib" filename extension. Each library is a small database that you can search and manipulate with standard SQL commands using, for example, the sqlite3 command line tools or SQLite Expert Personal.

BiblioSpec does not require that you know any SQL, but should you be interested in using these files outside of the BiblioSpec context theexample parameter file.