Issue 956: Support for importing results from the Sage search engine

issues
Status:open
Assigned To:Brian Pratt
Type:Defect
Area:Skyline
Priority:3
Milestone:4.3
Opened:2023-05-17 15:13 by Brian Pratt
Changed:2023-05-17 15:13 by Brian Pratt
Resolved:
Resolution:
Closed:
2023-05-17 15:13 Brian Pratt
Title»Support for importing results from the Sage search engine
Assigned To»Brian Pratt
Type»Defect
Area»Skyline
Priority»3
From https://skyline.ms/announcements/home/support/thread.view?rowId=61119


Support for importing results from the Sage search engine    mlazear (Signup)    EDIT 2023-05-16
 
Is it possible to add support for importing search results from Sage into Skyline? Sage is an open-source, free, and faster implementation of the fragment indexing approach popularized by MSFragger.

Sage writes a simple TSV based format - a couple notes (happy to provide more info if you decide to add support):

filename column may contain ".gz" or ".gzip" at the end - Sage can directly search mzMLs packaged in gzip, as exported by MSConvert, etc. ("b1906_293T_proteinID_01A_QE3_122212.mzML", "b1906_293T_proteinID_01A_QE3_122212.mzML.gz", etc). Sage only supports the mzML format at this time.
Search results may be concatenated from multiple files
Peptide sequences follow the ProForma notation: N-terminal modifications are specified as "[+42.0]-PEPT[+57.0]IDE", and C-terminal as "PEPTIDE-[+42.0]", etc.
Unless supplied with a fasta file already containing decoy entries, Sage will generate decoys by reversing the interior amino acids (SAMPLER becomes SELPMAR)
Currently, Sage is outputting unfiltered results, so decoys will be present (label = -1), and uses a built-in linear discriminant analysis PSM rescorer (with spectrum_q, peptide_q, protein_q corresponding to Q-values at each aggregation level and "posterior_error" corresponding to log10(PEP) at the spectrum level)
I have uploaded an example results file to https://skyline.ms/_webdav/home/support/file sharing/%40files/results.sage.tsv

The raw data corresponds to (b1906_293T_proteinID_01A_QE3_122212.mzXML) from the paper An Ultra-tolerant Database Search Identifies more than 100,000 Modified Peptides (PXD001468). I can upload the mzML to S3 if needed.

Please let me know if there is more information, files, etc I can provide!
Best,
Mike