|
|
Mike Riffle has many years of experience in bioinformatics, data science, and statistics; and has worked with several leading academic proteomics laboratories in the US. He is a proponent of open and reproducible science, and has produced several open source tools for proteomics data visualization, including Proxl (protein cross-linking) and, more recently, Limelight (generalized proteomics data visualization and sharing). Michael has a Bachelors in Molecular and Cellular Biology from the University of Washington and his Masters in Computer Science from the University of Illinois Urbana-Champaign. |
Typically, DIA proteomics searches require a priori knowledge of expected peptide sequences to identify peptides in a sample. This can represent a very real practical limitation in fields such as environmental metaproteomics or microbiome analyses where the expected proteomes are not known.
Read More
Researchers may attempt to overcome this limitation by using large, non-specific databases that may (or may not) include relevant sequences (e.g., Uniprot) or endure the expense of generating a metagenome in order to generate a metaproteomic database. Cascadia is a deep learning model using a transformer architecture and was recently published as a means of performing a de novo DIA search that requires no a priori knowledge of expected peptides. Here we present a computational workflow written in Nextflow that fully automates all steps of the Cascadia de novo workflow, including raw file conversion, analysis by Cascadia, generation of a Skyline document, and optional upload to PanoramaWeb. The workflow greatly simplifies the installation and running of the Cascadia workflow, requiring only the installation of Nextflow, Docker, and Skyline (for visualization).
Read Less