Questions and Answers: /home/software/Skyline/events/2017 Webinars/Webinar 14

Questions and Answers

Q: Is it possible to export the raw data as PDF format?

Ans: Skyline does not currently have an option to export to PDF. Skyline does have a lot of visuals, most of which can be individually exported (or copy-pasted) to raster images, scalable metafiles or even a data table, which you can use to generat your own graphs in R or Excel, or your favorite graphing program. It also has the option to upload your data files to PanoramaWeb.org, which makes it possible to get PNG files for many of the same images using HTTP requests of the website, which others have used for more programmatic access to these images. If you are interested in seeing a PDF report in the future, it would be useful to know what you would want to see in this PDF, since it is not at all clear from the description "raw data", which we tend to associate with the instrument vendor raw data files.

Q: I am having problems with shoulders of peaks getting quantified. Can I get Skyline to do centroiding and deisotoping or do I need to rely on other software packages?

Ans: At present, you can specify that you want to import centroided data into Skyline, and Skyline will use a vendor supplied centroiding algorithm where one is available to us. This was discussed in the webinar. It does not, however, perform deisotoping, as you can see by clicking on the chromatograms in Skyline, which will bring up the Full-Scan graph and show the m/z region from which the fragment ions were extracted. You will see the trailing isotope peaks. SCIEX provides a mzML converter which I think does deisotoping, but its two modes are described as: 1) quantitative - no deisotoping and 2) search optimized - with deisotoping. So, I would recommend confirming that deisotoping does not negatively impact your quantification. Though, certainly it may improve it, as we found centroiding did, and we would love to hear about that.

Summary: Skyline has very nice support for importing raw data files from Thermo, SCIEX, Bruker and Agilent with their centroided spectra. For Waters we do not have a vendor supplied centroiding algorithm. The SCIEX software library we have provides centroided spectra about 9x slower than profile spectra. So, I recommend considering conversion to centroided mzML or mz5 before import, if you expect to do more than one import, using ProteoWizard (a well tested workflow). If you want deisotoping, you will have to rely on tools outside Skyline to produce the mzML you desire.

Q: We routinely identify >1mio PSMs with our DDA approach. What can I do to minimize memory consumption when matching this into DIA runs?

Ans: I think I misinterpreted this during the webinar. I thought this was asking about identifying 1 x 10^6 peptide precursor targets, which would produce at least 6 x 10^6 targeted transitions, about 8 times as large as anything I have ever seen, even counting the global human library from the Aebersold lab. Now, I see that these are PSMs, which likely means a considerable smaller non-redundand library. The most memory efficient processing can be achieved using SkylineRunner.exe (provided on the Skyline installation pages as an attachment below the installation button). Because SkylineRunner.exe does not need to maintain a UI or multilevel Undo, it is a great option for bulk processing, consuming noticeably less memory, and completing the processing more quickly because of the steps it can streamline without needing to update the UI.

Q: I used DDA for peptide ID and to generate spectrum library to import to Skyline. Once I imported the DIA and asked skyline to pick at least 5 product ions, but only 3 product ions are selected for all peptides. Even the most intense peptide. What setting could I possibly do wrong?

Ans: This sort of question will be easiest to answer through a request to the Skyline support board (http://skyline.ms/support.url) with a post that contains screenshots of what you are doing and seeing.

Q: Does skyline support any way to calculate protein FDR as an addition to normal PSM FDR for DIA data? This was shown at last years ASMS to be a huge issue for large DIA datasets.

Ans: It does not yet, but members of the MacCoss lab were co-authors on a manuscript from the Aebersold lab in review covering this subject. And we agree on its usefulness for exploratory DIA. We expect to implement something in 2017.

Q: What is the 'co-elution count' in the Prophet score? 'Co-elution' with what?

Ans: Co-elution of the targeted fragment ions. It is more or less a count of the targeted fragments that appear to be co-eluting. Chromatogram peaks with an apex poorly aligned with the dominant peak or less than 1% area of the dominant peak get dropped from this count. This, combined with a log-intensity score, was the original score in Skyline for SRM data processing. Now even the default score integrates components such as spectrum dot-product and delta from expected retention time, but it is gratifying to see that the original Skyline score is usually given considerable weight in the mProphet models.

Q: I've seen Mascot, TPP, sequest, but is it possible to build the library directly from a ProteinPilot result (.group)?

Ans: Yes. Skyline has support for over 20 peptide search workflows, and ProteinPilot is on of them. You can see the full list at https://skyline.ms/wiki/home/software/Skyline/page.view?name=building_spectral_libraries. Most now supply Skyline with useful retention time information, but you should check your libraries after they are built, as described here for Mascot which can have difficulties with certain MGF generation tools: https://skyline.ms/wiki/home/software/Skyline/page.view?name=mascot_missing_rt. We have been seeing very good results with MaxQuant on data sets we are exploring with Bruker.

Q: How much would data/DIA analysis profit from raw file formats providing random access to the data (e.g. the novel one from Bruker)?

Ans: I am not sure. Skyline is now very well optimized for processing these large scale DIA runs sequentially with 1 or 2 passes through each file. In our testing, this has made processing data on SSD only 10% to 20% faster than much cheaper and larger spinning HD. Our original solution for these experiments with >100,000 transitions turned out to be too dependent on random access to the temporary data store we built during chromatogram extraction. I would expect a solution dependent on random access to a data file to degrade similarly in the face of large extraction jobs. But, it could provide huge benefit on smaller jobs, with fewer targets or updating the Skyline document as new targets are added. We have been working with SCIEX on calling their interface directly for chromatograms, and for small documents it appears much faster than direct extraction from spectra. However, at present, in using it we lose the informatoin we collect ourselves when extracting from spectra, currently mass error and an ID for the spectrum from which each point was extracted, in support of the Full-Scan graph and the ability to just click on the chromatogram to see the spectrum from which a point was extracted.

Q: Would you recommend recording DIA directly in centroid mode instead of profile mode on Thermo instruments?

Ans: This is the direction mass spectrometrists at the MacCoss lab are going. Some are now only saving centroid mode data, because our testing shows that we get better extracted chromatograms and better quantitative results from these spectra, and centroid-only files are so much smaller.

Q: When you added Decoys, you ended up doubling the number of peptides - when you export this report, decoys will be present - are we to exclude these peptides because they are simply used for the modeling part?

Ans: The custom reports in Skyline are extremely flexible and they include the option set filters. In the report I used for the webinar, I had a filter that excluded both the decoys and the iRT standards, so that only true targets would get counted. See the video response to this question for a visual expanation of how this was achieved.

Q: Can you read spectral libraries made in Spectronaut?

Ans: Yes. Skyline reads what are called "Assay Libraries" (a tabular format extended transitonlist with relative ion abundance and normalized retention time - iRT- information added) by any of the producers we know of (Spectronaut, Aebersold lab TPP & OpenSWATH pipeline, PeakView). This is explained in the following Tip https://skyline.ms/_webdav/home/software/Skyline/%40files/tutorials/ImportingAssayLibraries-2_6.pdf. If you encounter any issues or a source of assay libraries we do not support, please let us know on the Skyline support board.

Q: What is the best strategy to generate the spectral library? Can we start from MaxQuant search results?

Ans: Yes. We have been having very good success with MaxQuant and data sets we are exploring with Bruker. We hope that the best way will be through the Skyline workflow demonstrated in this webinar, but this is an active area of research for us, and it seemed during the webinar that the Aebersold lab TPP & OpenSWATH pipeline for generating "assay libraries" achieve slightly better results. I am digging into this now, and believe that I may be able to make a Skyline-daily release in the near future that does as well or better than the pipeline that produced the assay library for the Navarro paper. Then there should be no need to venture elsewhere for any of the 20 search pipelines that Skyline supports.

Q: We get a spectral library upload filter score (0.95) - is this similar to q-score setting from Percolator node (in PD when using Sequest)?

Ans: Both are probability-based scores. For each search pipeline Skyline supports, we work with its developers to determine the best probability score to use for a cut-off. Many of those scores, such as q values and adjusted p values have zero as the best possible score and 1 as the worst. However, Skyline started with supporting pepXML scored with PeptideProphet, which uses a posterior error probability where 1 is best and zero is worst. It is the probability that the match is true. When the score is 0.95, then all matches with the score 0.9 the match has a 1 in 10 chance of being wrong. This is different from a q value, which estimates the false discovery rate when all matches as well or better than the current match are accepted. Also, in Skyline for a q value the cut-off, which is always 1 best and zero worst will apply to 1 - q value. So, 0.99 will provide a library with an estimated 1% FDR. Since the Prophet posterior error probabilities do not provide an inherent FDR estimate, we need to use other techniques to estimate FDR in choosing the cut-off. During the webinarI used simple decoy counting to estimate both protein and peptide level FDR. Reviewing the Navarro, Nature Biotech. 2016 paper, I found that they used MAYU to estimate protein-level FDR separately in the 3 organisms (E. coli, Yeast, Human) and applied the following cut-offs respectively to build the assay library we used: 0.319349, 0.92054 and 0.995832. In the webinar I used the Skyline default of (0.95) which usually does pretty well at achieving 1-2% protein level FDR.

Q: Why aren't you promoting de-isotoping? Wouldn't you get a huge performance increase in avoiding selecting wrong isotopes?

Ans: Mostly because I have no tested evidence to show that improves both ability to detect peptides and quantify them in this data. Sounds like it would be worth a test to see if it does improve the results, and the data set I just presented would be a good place to start. If you get there first, let us know what you find. Otherwise, if it is better for this application, we will let others know once we have shown that to our satisfaction.

Q: Are you planning to support TIMS-TOF (Bruker) data analysis?

Ans: Yes, we are in discussions with Bruker on this subject now, and those seem promising. We believe the work will benefit greatly from our experience and work over the past 2-3 years on IMS with Waters and Agilent.

Q: How did you manage to observe 1% FDR? Is this after spectral library upload followed by decoy generation? There was a slide when you had I think 76 decoy proteins out of 7000+ proteins? Whenever I perform decoy peptide generation, my protein/peptide list doubles.

Ans: This was for 1% FDR at the protein level for the peptide search (in this case performed with Mascot and Comet with iProphet to combine the two searches). Indeed the search FASTA contains reverse protein decoys, effectively doubling the search space. I then made the estimate based on how many proteins starting with reverse_ ended up as targets in my Skyline document. I used the Skyline status bar to make this count. As I have noted above the manuscript authors used a more stringent method, employing a tool called MAYU and determined cut-off scores independently for 3 separate samples covering the 3 organisms (E. coli, Yeast, Human) from the experiment.

Q: What are best MSconvert settings to convert from Sciex wiff to mz5?

Ans: I showed this on the webinar, using MSconvertGUI (which is installed with ProteoWizard). In summary, I use the default settings with a Peak Picking filter using vendor software (i.e. not instrument independent centroiding coded into ProteoWizard by academic programmers).

Q: Is it possible to manually substitute individual spectra in a given spectral library without altering anything else in that library (i.e. if a higher quality spectrum for a particular peptide becomes available)?

Ans: Skyline currently does not provide manual editing features for spectral libraries. On this large a scale, we highly recommend sticking with search statistics from a well tested search pipeline. For smaller, more targeted experiments, you may indeed be drawing from synthetic standards and things like SRM-triggered MS/MS to achieve high individual spectrum quality. It is possible to order the libraries you use so that earlier libraries get priority over later ones. In this way you could have a statiscally generated library as your default, while placing a more specific library ahead for certain targets in which you have more confidence in your measurements. You can also build these libraries using the SSL format, which gives you ultimate flexibility at the expense of requiring you to generate this tabular format for the spectra yourself.

Q: How do you export a transition list/ and assay library generated in skyline to generate a .tsv file that can be imported again later on?

Ans: We don't recommend this as you would likely just lose useful information by exporting to a less information rich fromat (assay library vs. Skyline document). You can even copy and paste between Skyline documents or import one document into another with File > Import > Document. There is really no reason to export to TSV just to re-import back into Skyline. If you wanted to export to TSV to use with another tools like Spectronaut or OpenSWATH, that should be possible with the Skyline custom report export format, and maybe a bit of search and replace in your favorite test editor. You will get a CSV, but it should be relatively easy to transform into a TSV. Biognosys provides a Skyline report template on their site for this under Support Materials for Spectronaut (also attached below).

Q If I don't have a premade library like you just demonstrated. I do have a bunch of Hillic fractions from DDA run with iRTs. How can I turn this into a library?

Ans: This is a question from the afternoon session where only starting from the assay library was shown. Starting from the raw DDA data was shown in the morning. Both sessions have been edited together in the video to cover both workflows (starting from assay library and starting from DDA search results). Hopefully, this will explain both for anyone willing to watch the video. Sorry for not getting it all into one webinar.

Q: With a SSD, is converting Thermo raw files to mz5, then importing the fastest strategy? Any downsides?

Ans: I probably would not put the time into converting from Thermo or Bruker into mz5 or mzML, unless I planned on doing a lot of processing of these same files. These formats import fairly quickly for centroided spectra, because the contain these spectra already processed and do no on-the-fly centroid calculation, unlike SCIEX WIFF files, which only store profile spectra in TOF data, and calculate centroided spectra as they are requested, slowing down import considerably. I am still a little torn between mz5 and mzML, because mzML scales as expected during multi-file parallel import, but for single-file import mz5 is faster and the files are smaller. If you want to convert, do your own testing between mz5 and mzML on your own hardware.

Q: When I train a model, my decoys are very similar to targets, how do I achieve better separation - I'm using a public library (Rosenberger)?

Ans: Using a model generated from your own data (as described in this webinar) will always do better than a public library from elsewhere. One of the final processing steps in the Navarro, 2016 paper was to analyze a larger library assembled by Biognosys that included some of the peptides that DIA-Umpire found in the data, but which were not in the assay library from the ETH DDA runs. It turned out that the chromatography of this "off-column" (different chromatography) library was considerably different from the chromatography used in generating the DIA files. This resulted in a S-shaped correlation between iRT scores and the times at which best peaks were found, and it required a 20-minute window (as opposed to the 10-minute window used with the "on-column" library) to fully capture the target peaks. This difference both reduced the value of delta-RT as a score, and required the widening of the RT extraction range. Similarly, as discussed in the webinar, initial settings for Skyline used a 10,000 resolving power, which was far too wide for the data. So, there are a number of things that could be causing your poor mProphet model from poor data quality itself to less optimal settings to just the difference between your system and the one that acquired the library. I would suggest starting with acquiring your own DDA to build your library and gain some familiarity before moving to libraries generated by others.

Q: How to make a report type/matrix within skyline for MS stats for technical and biological replicates (e.g. 50 x sample each injected 3 x)?

Ans: Install the MSstats external tool from the Skyline Tool Store (Tools > Tool Store from within Skyline). The necessary report template comes with the installation of the tool. For further support, contact the MSstats team.

Q: Can we use peptide abundance (MS1) for quantification rather than product ion abundance (MS2) using Skyline?

Ans: Yes, you can. The Skyline reports have a value Total Area MS1 (and also Total Area Fragment) that allow you to separate out MS1 area, if you extract both MS1 and MS/MS chromatograms. Similarly the Group Comparison support in Skyline allows you to choose between either value as the basis for its differential comparison. However, you should think carefully about the selectivity of each method before choosing which to use, and maybe perform a replicate experiment to measure CVs of each. There are several papers that report increased dynamic range using peak areas from MS/MS with prior DIA/SWATH isolation. Especially the methods that narrow the precursor isolation windows further from the original 25 m/z, like the 64-windows variable isolation scheme used in the Navarro paper, are likely to allow better quantification than using MS1. Of course, this also depends on the complexity of your data. If the complexity is low enough then the benefit of increased selectivity declines and you may start to see MS1 outperforming MS/MS. As sensitivity in these measurements is based on signal-to-noise, increased selectivity in a complex sample can help you decrease noise (the original reason for turning to SRM), but in a low noise/interference environment then MS1 may deliver more ions to the detector for an increase in signal.

Q: What is the best way to normalize data? Is it possible to get Navarros R file for the quantification and statistics?

Ans: Hard to say what type of normalization this question is referring to, and it may be better to consult a statistician for the answer to this question. It is possible to get the Navarro R files. Directions to them should be included in the paper (Nature Biotech. 2016) and the copies used in the webinar will be posted with the webinar data set. However, they are not generally useful for your own quantitative experiments, but are instead specialized for analysis of the 3-organism experiment of the paper used to compare quantitative effectiveness of various tools and settings in exploratory proteomics experiments. For your own quantitative experiments, we recommend exploring other tools like MSstats and seeking the help of the statisticians building them.

Q: How to incorporate SIS in DIA? I spiked in 25 SIS peptides into my DIA experiment.

Ans: Not totally clear what is being asked here, but there are certainly ways to incorporate stable isotope-labeled standards (SIS) in DIA experiments. They can be used for retention time standards for one, or normalization standards. The MacCoss lab frequently uses a labeled protein to assess digestion efficiency. The Broad Institute is measuring a panel of 100 phosphorylated peptides across thousands of DIA runs, using SIS paired reference peptides to maintain comparability over the years it will take to collect the data.

Q: The report says precursor peak area scored - are precursors included for quantification or should only b and y ions be included?

Ans: You can include precursor ion transitions from MS1 in your transition targets in Skyline. That can be achieved by setting MS1 extraction settings in the Transition Settings - Full-Scan tab where the DIA isolation scheme was set in this webinar, and adding the "p" ion type in the Transition Settings - Filter tab. Unfortunately, we have not done enough experimentation with this and mProphet models combined to say for sure whether it would improve or harm the mProphet modeling results. It can definitely be useful for visual confirmation, but our research has also shown that there can be times when MS1 signal does a poor job at detecting peptides we know are present, compared to the MS/MS signal. Both being present is stronger confirmation, but MS1 not being present does not necessarily preclude detection and even quantification of a peptide.

Q: When I do DIA on human plasma, what subset of the Rrosenberger assay library should I use?

Ans: Please consult the Aebersold lab to see what they think. You may also want to try generating your own DDA library to follow the workflow presented in the webinar, with an "on-column" DDA library, which will nearly always produce improved iRT scores for better retention time prediction, which can provided 10-20% improvement in recall.

	Attached Files
	Attached Files

spectrodivetransitionlist.skyr

MacCoss Lab Software

MacCoss Lab Software