How Skyline Builds Spectral Libraries

2024-04-15

Skyline builds spectral libraries using a separate program called BiblioSpec, which has two main components. BlibBuild is called to build the redundant library, which is then filtered by BlibFilter to create the non-redundant library. The BlibBuild page contains information on the various search engines that are supported, along with information about their respective file formats and the scores used with the cut-off value specified in Skyline.

BlibFilter chooses the best spectrum within a group by simply using the one with the best score. If there are multiple spectra tied for the best score, the one with the highest TIC is selected. In the past, BlibFilter chose the spectrum with the highest average dot product when compared to all other spectra within the same group, but this method occasionally produced poor results. A similar method, computing a consensus spectrum and its dot product against the related spectra, also produced inferior results as it sometimes resulted in high-noise spectra being chosen.

Skyline with BiblioSpec supports building libraries from the following peptide spectrum matching pipeline outputs:

Database search Peptide ID file extension Spectrum file extension
*RAW includes vendor formats like RAW, WIFF, .D, etc.
Score Used Notes
Generic SSL .ssl   score column A generic format for encoding spectrum library entries.
ByOnic .mzid .MGF, .mzXML, .mzML AbsLogProb  
Comet/SEQUEST/Percolator .perc.xml, .sqt .cms2, .ms2, .mzXML q-value Percolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory.
DIA-NN .speclib   none No separate spectrum file. In the current implementation, no score is imported from the library, so all spectra are imported.
IDPicker .idpXML .mzXML, .mzML FDR The name(s) of the spectrum file(s) are given in the .idpXML file.
MS Amanda .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* q-value  
MSFragger .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* q-value  
MSGF+ .mzid, .pepXML .mzML, .mzXML, .MGF, RAW* expectation value  
Mascot .dat   expectation value No separate spectrum file.
MaxQuant Andromeda msms.txt + evidence.txt + mqpar.xml + modifications.xml .mzML, .mzXML, .MGF, RAW* PEP It is possible to use peaks embedded in the msms.txt, but external spectra files are preferred because the embedded peaks are charge deconvoluted. mqpar.xml must be located in the grandparent, parent, or same directory. A custom modifications.xml, modifications.local.xml, or modification.xml can be placed in the same directory as the search results (or specified using the -x option).
Morpheus .pep.xml, .pepXML .mzXML, .mzML q-value The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1).
OMSSA .pep.xml, .pepXML .mzXML, .mzML expectation value The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
OpenSWATH .tsv   m_score column No separate spectrum file.
PEAKS DB .pep.xml, .pepXML .mzXML, .mzML confidence score The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PLGS MSe final_fragment.csv   score column There need not be a . before 'final_fragment'..
PRIDE .pride.xml   various No separate spectrum file.
PeptideProphet/iProphet .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* probability score The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PeptideShaker .mzid .MGF confidence score  
Protein Pilot .group.xml   confidence score No separate spectrum file.
Protein Prospector .pep.xml, .pepXML .mzML, .mzXML, .MGF, RAW* expectation value  
Proteome Discoverer .msf, .pdResult   q-value No separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified.
Proxl XML .proxl.xml .mzML, .mzXML, .MGF, RAW* q-value  
Scaffold .mzid .MGF, .mzXML, .mzML peptide probability  
Spectronaut .csv   none Spectronaut Assay Library export. No separate spectrum file.
Spectrum Mill .pep.xml, .pepXML .mzXML, .mzML expectation value The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
X! Tandem .xtan.xml   expectation value No separate spectrum file.

 

Importing Existing Spectral Libraries

Skyline can also directly read existing spectral libraries (without using BlibBuild) including:

  • SpectraST (.sptxt) 
  • theGPM  X! Hunter (.hlf) 
  • Shimadzu (.mlb)
  • Golm Metabolome Database (.msp)
  • NIST (.msp)
  • EncyclopeDIA (.elib)

Working with NIST files

If your library contains spectra for multiple instruments and conditions (e.g. various CE values) it is important to use the NIST-supplied filtering tools to produce a subset of spectra appropriate to your experimental conditions. Each molecule+adduct (or peptide+charge) pair can appear in a .blib file only once, and without thoughtful filtering you will almost certainly produce a .msp file that can't be used by Skyline because it contains more than one instance of a molecule+adduct (or peptide+charge) pair.