How Skyline Builds Spectral Libraries: /home/software/Skyline

How Skyline Builds Spectral Libraries

Skyline builds spectral libraries using a separate program called BiblioSpec, which has two main components. BlibBuild is called to build the redundant library, which is then filtered by BlibFilter to create the non-redundant library. The BlibBuild page contains information on the various search engines that are supported, along with information about their respective file formats and the scores used with the cut-off value specified in Skyline.

BlibFilter chooses the best spectrum within a group by simply using the one with the best score. If there are multiple spectra tied for the best score, the one with the highest TIC is selected. In the past, BlibFilter chose the spectrum with the highest average dot product when compared to all other spectra within the same group, but this method occasionally produced poor results. A similar method, computing a consensus spectrum and its dot product against the related spectra, also produced inferior results as it sometimes resulted in high-noise spectra being chosen.

Skyline with BiblioSpec supports building libraries from the following peptide spectrum matching pipeline outputs:

Database search	Peptide ID file extension	Spectrum file extension *RAW includes vendor formats like RAW, WIFF, .D, etc.	Score Used	Notes
Generic SSL	.ssl		score column	A generic format for encoding spectrum library entries.
ByOnic	.mzid	.MGF, .mzXML, .mzML	AbsLogProb
Comet/SEQUEST/Percolator	.perc.xml, .sqt	.cms2, .ms2, .mzXML	q-value	Percolator v1.17 does not include sequence modification information therefore the .sqt file from the SEQUEST search must be present in the same directory, the directory containing the cms2/ms2 spectrum files, or the current working directory.
DIA-NN	.speclib and .tsv or .parquet		Global.Q.Value	No separate spectrum file, but results for individual runs are read from a TSV or Parquet file in the same directory as the speclib.
IDPicker	.idpXML	.mzXML, .mzML	FDR	The name(s) of the spectrum file(s) are given in the .idpXML file.
MS Amanda	.pep.xml, .pepXML	.mzML, .mzXML, .MGF, RAW*	q-value
MSFragger	.pep.xml, .pepXML	.mzML, .mzXML, .MGF, RAW*	q-value
MSGF+	.mzid, .pepXML	.mzML, .mzXML, .MGF, RAW*	expectation value
Mascot	.dat		expectation value	No separate spectrum file.
MaxQuant Andromeda	msms.txt + evidence.txt + mqpar.xml + modifications.xml	.mzML, .mzXML, .MGF, RAW*	PEP	It is possible to use peaks embedded in the msms.txt, but external spectra files are preferred because the embedded peaks are charge deconvoluted. `mqpar.xml` must be located in the grandparent, parent, or same directory. A custom `modifications.xml`, `modifications.local.xml`, or `modification.xml` can be placed in the same directory as the search results (or specified using the `-x` option).
Morpheus	.pep.xml, .pepXML	.mzXML, .mzML	q-value	The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory. Spectra are looked up by index, which is calculated using (scan number - 1).
OMSSA	.pep.xml, .pepXML	.mzXML, .mzML	expectation value	The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
OpenSWATH	.tsv		m_score column	No separate spectrum file.
PEAKS DB	.pep.xml, .pepXML	.mzXML, .mzML	confidence score	The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PLGS MS^e	final_fragment.csv		score column	There need not be a . before 'final_fragment'..
PRIDE	.pride.xml		various	No separate spectrum file.
PeptideProphet/iProphet	.pep.xml, .pepXML	.mzML, .mzXML, .MGF, RAW*	probability score	The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
PeptideShaker	.mzid	.MGF	confidence score
Protein Pilot	.group.xml		confidence score	No separate spectrum file.
Protein Prospector	.pep.xml, .pepXML	.mzML, .mzXML, .MGF, RAW*	expectation value
Proteome Discoverer	.msf, .pdResult		q-value	No separate spectrum file. Libraries cannot be built from databases that do not contain q-values, unless a cutoff score of 0 is explicitly specified.
Proxl XML	.proxl.xml	.mzML, .mzXML, .MGF, RAW*	q-value
Scaffold	.mzid	.MGF, .mzXML, .mzML	peptide probability
Spectronaut	.csv		none	Spectronaut Assay Library export. No separate spectrum file.
Spectrum Mill	.pep.xml, .pepXML	.mzXML, .mzML	expectation value	The names of the .mzXML files are given in the .pep.xml file and may be in the parent or grandparent directory.
X! Tandem	.xtan.xml		expectation value	No separate spectrum file.

Importing Existing Spectral Libraries

Skyline can also directly read existing spectral libraries (without using BlibBuild) including:

SpectraST (.sptxt)
theGPM X! Hunter (.hlf)
Shimadzu (.mlb)
Golm Metabolome Database (.msp)
NIST (.msp)
EncyclopeDIA (.elib)

Working with NIST files

If your library contains spectra for multiple instruments and conditions (e.g. various CE values) it is important to use the NIST-supplied filtering tools to produce a subset of spectra appropriate to your experimental conditions. Each molecule+adduct (or peptide+charge) pair can appear in a .blib file only once, and without thoughtful filtering you will almost certainly produce a .msp file that can't be used by Skyline because it contains more than one instance of a molecule+adduct (or peptide+charge) pair.

MacCoss Lab Software

MacCoss Lab Software