Dear Skyline Users,
During our first Skyline Tutorial Webinar we received many great questions from the participants -- here are the answers!
Q: How do you display the different charged states for all the peptides at once?
Ans: Hard to remember the context for this question. There are many answers to this question, depending on the context. If you want to see this information in the Targets view, you could use the Edit > Expand All > Peptides to show the various precursors and their charge states. In the Document Grid, you could also use Views > Precursors to get a grid view that you could customize showing peptides and charge states in tabular format. Both the Peak Areas and Retention Times views have displays that will show all peptides and charge states, using View > Peak Areas > Peptide Comparison and View > Retention Times > Peptide Comparison.
Q: What cutoff did you use for MASCOT searches (dat files)? Many people uses ion score cutoff >20; so should I use skyline-cutoff 0.99 as well (is it the more or less nearly the same)? And in your test set you used a cut-off of 0.99 for the import of a TPP search, but, if I am not mistaken, the iprophet score of 0.9 equals a FDR of 0.01. If that is true why you used 0.99 (please correct me if I am wrong); In general, I guess the community would appreciate a list with cut-off scores for each import format.
Ans: If you hover your mouse cursor over the cut-off score field, Skyline will show a tip with an explanation of the cut-off score. It was implemented at a time when I (Brendan MacLean) had been working with Peptide Prophet a lot, and so it is in the Peptide Prophet scale where 1 is best and 0 is worst. For every search engine we have identified what we think is the most useful probability-based score. Many of these, however, are based on p values, q values and expectation values where 0 is best. In these cases, we are using (1 - score) against the cut-off. In the case of Mascot, we were recently corrected by John Cottrell in our use of "Identity Score". He recommended we use the "Homology Score" instead. We have made that change in the latest Skyline 2.6 patch, but earlier versions will still use the "Identity Score".
Q: How do you get the quantitation data out of Skyline? For ex. if you wanted to export to a spreadsheet the intensity of a given peptide across 6 samples?
Ans: Using the menu item File > Export > Report or View > Document Grid an then clicking the Export button in the Document Grid toolbar. There is an extensive (40+ pages) tutorial on working with custom reports in Skyline. This is an area were we think Skyline really shines, once you understand the full depth of what is possible.
Q: How reliable would it be to use low res traps to do label-free quantitation?
Ans: We published a paper (Sherod, et al. J. Proteome Res, 2012) showing that it label-free quantification is quite possible on low res ion traps using PRM (a.k.a pseudo-SRM or targeted MS/MS). Often, however, in the proteomics field the term "label-free" is used as a synonym for chomatography-based quant using chromatogram extraction from the MS1 scans in DDA. In this case, I would caution that with low res data this is method may have selectivity problems in a complex mixture. Again, PRM and quant. based on product ion chromatogram extraction can greatly increase selectivity to allow chromatography-base quant on low resolution ion traps.
Q: Could we resolve interference peaks if we used centroid data?
Ans: You might be able to. Perhaps in many cases you could. But it is important to remember that your instrument resolution still plays an important role, and you may also be simply pushing the source of quantitative error into your centroiding algorithm. When two peaks cannot be resolved in either the Time or m/z dimension, then even the best centroiding algorithm will not be able to separate them cleanly. And this will affect your quantitative numbers, potentially more drastically than a resolution-based filtering algorithm, because a small shift in m/z of the centroided peak could result in downstream algorithms deciding no intensity is present at a given time, where in fact interference has simply caused the centroiding algorithm to shift the peak outside the range considered to be reasonable mass accuracy.
Q: How does Skyline compare with Progenesis?
Ans: Skyline uses a targeted or peptide-centric approach, where it queries specific target peptides, while Progenesis uses a data-centric approach, where it tried to derive what peptides might be present given the data. Both are useful approaches. Historically, data-centric approaches tend to be used in what is often called "discovery" proteomics, while peptide-centric approaches are gaining popularity through targeted proteomics. The two tools may complement each other very well, with Progenesis used for "discovery" and Skyline used for further validation of potential candidates.
Q: Is there a limit on the number of files you can compare?
Ans: (Birgit) I have imported many many files, up to 50 or 100. In such cases you can display tiled views in groups etc. Skyline files can get large with many files. However, as Brendan showed, we are importing the full scan data in only 5 minutes around the MSMS identifications upon initial import (transition settings, full scan) which greatly reduces Skyline file size.
Q: How do you deal statistically with technical duplicates, triplicates, etc.?
Ans: Statistical algorithms actually implemented in Skyline are still quite limited. Generally, we recommend that you use Skyline External Tools like MSstats (differential statistics) and QuaSAR (response curves), which deal quite well with technical replicates, or you export a custom report and use tools like R or Excel where it is really up to you how to deal with technical replicates.
Q: Will it be possible to analyse triple SILAC experiments with Skyline?
Ans: Yes. Skyline has a type of modification in the Peptide Settings - Modifications tab called an isotope modification, where the modification is expected to be only an isotopic change, usually with matching retention time to the monoisotopic molecule (though this is also configurable). By default Skyline allows a single "heavy" labeled form, but any number of labeled forms can be added. In targeted experiments, this has been used to monitor both injected AQUA peptides (with SILAC-like labeling) and peptides from injected 15N labeled proteins in the same run. And there is no limit in Skyline. You could easily also monitor other labeling strategies on the same peptides. In Skyline 2.6, we finally improved SILAC support by making it possible to specify that none of your isotope labeled forms are internal standards, which was really the final step to supporting true SILAC experiments as opposed to targeted measurement of peptides with SID reference peptides in the sample.
Q: Is the cutoff score specific to the type of search engine used for spectral matching?
Ans: Yes. For each search engine Skyline supports, we have made an effort to choose the most appropriate probability-based score against which to use the cut-off. To some extent, then, we have attempted to normalize them to where 1 is good and 0 is bad by using (1 - prob) when the probability score is one where 0 is good (e.g. p value, q value, PEP, expect). However, not all probability scores are equal. So, some care must still be used when choosing a cut-off that it is appropriate to the source of the spectrum matching results.
Q: Is it possible to analyse data which were acquired using SIM-Scan only or does Skyline always need also a Full-Scan?
Ans: Yes, it is possible and has been done successfully by many.
Q: What do you mean by selective?
Ans: By selective I mean the ability to measure the analyte of interest without interference. This is a useful paper on the topic (http://www.degruyter.com/view/j/pac.2001.73.issue-8/pac200173081381/pac200173081381.xml).
Q: How do I build a spectral library?
Ans: If you are processing DDA data through the MS1 filtering protocol described in the webinar, then you would just use the "Import DDA Peptide Search" action on the Start page, or the File > Import > Peptide Search menu item, as described in the MS1 Full-Scan Filtering tutorial, and the spectral library will be built during the steps of the wizard. If you want to build a spectral library for other reasons, then you would use the Build button on the Peptide Settings - Library tab, which is described in the Targeted Method Editing and Spectral Library Explorer tutorials.
Q: Do you have any online webinars or tutorials which focus on how to run label-free quantitative proteomic experiments?
Ans: The tutorials and webinars we have are focused on analyzing the resulting data with Skyline. If by "label-free" you mean by using chromatograms extracted from the MS1 scans in DDA data, then the most appropriate material is this webinar (Webinar #1) and the MS1 Full-Scan Filtering tutorial. Other tutorials do focus on processing label-free data in the sense that no isotopically labeled peptides are used.
Q: In your tutorial, were these data acquired with a set dynamic exclusion?
Ans: I am quite sure they were, since 18,000 peptides were matched, and in reviewing the data, one can often see long gaps between MS/MS IDs that can span either side of a peptide elution, rarely sampling the apex. That said, I did not acquire the data myself, and I am unable to report what dynamic exclusion range was actually used.
Q: I have my data downloaded on Scaffold 4.0 through the MS facility and I want to do label-free quantification analysis of my data on Skyline. What is the workflow of how to import the files from Scaffold?
Ans: Skyline does support spectrum matching results from Scaffold. I am not that familiar with Scaffold, but I know there is a way to export mzIndentML and MGF files that can be imported into Skyline and used in the way that was demonstrated in this webinar. Maybe contact Scaffold support for the details of how to perform the necessary export.
Q: Is there a way to export my data out of Skyline to analyze statistical change in protein abundance or can this be done in Skyline?
Ans: We are working on this for the next release of Skyline. At present, we recommend using the MSstats Skyline External Tool. You can install it from the Tool Store using the Tools > Tool Store menu item. The tool details page on the Skyline web site has access to a small tutorial and a support board for MSstats. You can also use Skyline custom reports to export tabular data that can be used in R or other tools for your own statistical calculations. See the Custom Reports tutorial for details on creating your own reports.
Q: Is there any way to add custom enzyme specificity in the peptide settings? For example, can semi-Trypsin specificity (as available in Mascot) be included?
Ans: You can customize the enzyme used by Skyline in the Peptide Settings - Digestion tab. Skyline allows the definition of a wide variety of protease cleavage, including bi-directional when protease mixtures were used. Skyline does not support semi-tryptic cleavage of protein sequences, but it does support both targeting peptides based on cleavage of a protein sequence as well as completely flexible peptide lists, where you can paste in any peptide sequence you like, whether it is semi-cleaved or totally unspecified cleavage. Also, if you have built a spectral library from spectrum matching results that identified semi-cleaved or unspecified cleavage peptides, you can use the Add button in the Spectral Library Explorer (View > Spectral Libraries) to add these to the document in a peptide list. Skyline still will not match these to a protein for you, but you can certainly target and measure them, if you believe they are of interest.
Q: Is it possible to use identifications from Andromeda? What file is needed?
Ans: Yes. Skyline requires the msms.txt, and may also require the modifications.xml. The msms.txt file is the file you add when asked for spectrum matching results for library building.
Q: Do you recommend any particular resource for learning more about basic proteomics? I am new to it and am more or less teaching myself.
Ans: I have been told that the Skyline tutorials are actually useful for learning something about quantitative proteomics. One PI told me she has gives them to people starting in her lab. There are a number of good review papers now. I also highly recommend that workshops and short-courses at conferences like ASMS, ABRF, US HUPO, HUPO, etc. And, if you are in Europe I taught at the EU Proteomics Summer School, which seemed like a great place to learn the basics.
Q: While integrating m/z peaks, if the mass error is high, the colored shaded area (integration window) may only cover part of the peak. Is there any way to modify the integration window (of m/z)?
Ans: Yes. You can always make the m/z filter window wider. You do this by lowering the Resolving power values in the Transition Settings - Full-Scan tab. The lower the resolving power the wider the windows. In the future, we may add a way to globally shift all extraction in some direction by a number of PPM, since this would allow one to compensate for runs on a poorly calibrated system, where everything is systematically shifted to one side. For now, the only option is to use wider windows, which in most cases should have a much smaller impact than resolution itself.
Q: Is it possible to automatically narrow the borders of all peak areas to exclude interference peaks?
Ans: You can narrow your extraction ranges in the m/z dimension by increasing the Resolving power values in the Transition Settings - Full-Scan tab. As long as your data has enough resolving power to support this, you will gain more selective (i.e. less interference) data this way. If not, you may end up extracting from the apexes of your peaks in the m/z dimension, which can cause much higher variability in your measurements, since small shifts in mass accuracy will have a more pronounce effect on your intensities than when you are extracting from ranges that cover most of a fully resolved peak.
Q: How do you get intensity vs RT for fragment ions?
Ans: In the context of DDA data, you really can't get quantitatively useful fragment ion chromatograms, because the MS/MS scans are not collected for the same precursors systematically over time. You can set up Skyline to extract product ion chromatograms even in DDA data, but using the MS/MS settings in the Transition Settings - Full-Scan tab, but you will end up with very strange looking spike-shaped peaks within your MS1 peaks where a MS/MS scan was collected for the precursor. This is not a useful exercise. To get chromatograms for fragment ions you should use a method which collects MS/MS spectra for your precursors of interest systematically over time (e.g. SRM, PRM or DIA). For the methods involving full MS/MS spectra, you can then use the MS/MS settings in the Full-Scan tab to have Skyline extract the chromatograms.
Q: In Skyline does it matter whether you use centroid versus profile mode?
Ans: Skyline can handle either. Profile mode may be preferable, since it avoids the potential error from a centroiding algorithm, but Skyline can extract chromatograms from either type of data. Skyline was explicitly designed not to require centroiding.
Q: In the peak area windows, how is the expected peak area determined?
Ans: In the tutorial for this webinar the "Expected" peak areas are derived by calculating an expected isotope distribution. The exact algorithm is relatively complex, even more so for isotopically labeled peptides, which involves an isotope labeling enrichment, which can be set in Skyline. But, it is a well defined problem that software can certainly do as well as it can calculate the expected m/z values for your peptides and fragment ions, given chemical composition, atomic masses and relative abundances of isotopes.
Q Does Skyline do any normalization? For instance, what if more peptides were present in one sample preparation than another (possibly due to digestion efficiency)?
Ans: Skyline does use normalization in a number of cases. It certainly calculates ratios between analyte and standards, whether these standards are globally injected peptides or proteins, or reference peptides or proteins. In the case you describe, it sounds like the standard would need to be based on a protein or proteins.
Q: Is it possible to match several files as technical replicate and get combined data (e.g. mean of the area under the curve) in a exported result file?
Ans: For this purpose, we recommend the MSstats Skyline external tool. Skyline has a great deal of flexibility, and some nice features for grouping replicates and viewing the results in the Skyline Peak Areas graph. At the moment, its reports do not directly do this kind of grouped averaging, though that may be coming soon.
Q: Can you provide guidelines/suggestions on how to go from extracted peak areas of peptides to generate quantitative values on the protein level?
Ans: Again, MSsats in the Skyline External Tool Store (Tools > Tool Store in Skyline) is a great place to start for this.
Q: Are there any plans to incorporate MaxQuant search results into Skyline?
Ans: Yes. Skyline already supports importing Andromeda search results. You simply point the Skyline library builder at the msms.txt file, and you may need to put the modifications.xml file where Skyline can find it. Many labs are already using this pipeline. We will continue working with the MaxQuant team to improve integration between the two projects, as this may be another great combination for moving through discovery approaches to targeted validation of potential peptide/protein candidates.
Q: Is it really best to use profile data?
Ans: It is at least open for discussion, and not well proved either way. The data is what it is. Using a centroiding algorithm can make analysis simpler for downstream tools, but it also produces error which cannot be removed or easily investigated in downstream tools. With profile data and the new Full-Scan graph in Skyline 2.6, your can get a really good idea of what the data coming out of the mass spec looked like, and whether or not their might be interference. Much of this information would be lost with centroided data, though the same interference would still be adding error to the measurements.
Q: Would it be possible to suggest some MS1 extraction parameters (resolving power, etc.) for some of the more commonly used instruments? I am most interested in a ABSciex 5600 run in high sensitivity mode, but a table for a variety of instruments would be very useful.
Ans: Agreed. This probably requires input from people with a lot of experience with each instrument in question.
Q: Is it possible to analyze peaks from arbitrary isotope distributions such as metabolically 50% 15N labeled peptides?
Ans: It is. If you want to have Skyline suggest an expected distribution, then you would need to specify the isotope labeling enrichment in the Transition Settings - Full-Scan tab. This is probably not a great option when the enrichment may be anything, and you are trying to derive it from the distribution. Skyline does not really provide much help with that task.
Q: How is protein quantification in Skyline when compared with MaxQuant?
Ans: If you know the proteins you want to measure, then a targeted approach will tend to do better, as you can use prior knowledge of what you are targeting to reduce error in your measurements. For an initial discovery approach to your DDA data, then MaxQuant is likely a great choice. Once you narrow your list to a targeted set of candidates, Skyline is a great tool for hypothesis validation over many replicates. I hope the tools are complementary.
Q: The most challenging part of using Skyline for me is getting my data into the program. I am using an older AB Sciex Qstar with a Mascot workflow. But every time I try to get my data in, I run into some technical glitch that hangs up the program. Could your team provide some some step-by-step examples of how data from different platforms can be imported into Skyline?
Ans: I think the best answer to this is, please use the Skyline support board (Help > Support from within Skyline). Many people have been helped to a workflow that works for them this way. Honestly, we have also improved the software a lot to handle various types of data we neither have on hand ourselves nor could even predict without being given an example. The workflows Skyline started out handling were quite limited compared to today. Without people requesting help and supplying sample data, we could never have gotten this far. It is our hope that for each individual helped like this there are many others that benefit by simply finding Skyline working for their input data.
Q: If redundant peptides have different intensities, how does Skyline choose the one to keep? How does this effect quantitation?
Ans: I don't understand this question, but doesn't attempt do choose peptides for you. You could use Edit > Refine > Remove Duplicate Peptides, which would remove all peptides in the document that appear more than once. Or you could use Edit > Refine > Remove Repeated Peptides, which would remove the additional occurrences of a peptide after its first appearance in your document. But within Skyline no attempt is made to quantify proteins, and Mike MacCoss, when he is teaching, frequently illustrates the many difficulties of accurately quantifying proteins through peptide measurements. The exercise of determining the best peptides to measure a target peptide is largely left up to the Skyline user. Though, Skyline does have many features requested by people who have spent a lot of time on this. At present, the MacCoss lab does a lot with recombinant protein expression from cDNA libraries (Stergachis, et al. Nature Methods, 2011). If you want broad, discovery-level quantitative estimates of as large a set of proteins as possible, then Skyline is not really the right tool. If you want to target and refine your quantitative measurements of a specific set of peptides, and even prove that they can be used to measure your proteins of interest, then Skyline is likely one of the best tools you could use for this.
Q: Is there a way to add the M-1 transition for every precursor, without having to select it from the drop down for every single precursor?
Ans: Unfortunately, there is not. Sorry about that. This suggestion will be considered for future inclusion. The M-1 chromatogram is always extracted. So, you can always add it fairly easily for any peptide, but you do have to do this manually for each peptide where you want to see it.
Q: Is it better to use smoothed curve for quantitation? Or there’s no difference?
Ans: Skyline uses both 2nd-derivating and Savitzky-Golay smoothing in its algorithm to detect the boundaries of a peak, but it always uses straight trapezoidal summation of the raw data in calculating the peak areas. It is my understanding from a number of people I have discussed this with, that this produces areas with the least variance, compared with smoothed areas, but I don't have a citation or an experiment of my own to show this.
Q: What is the minimal number of peptides per protein for confident MS1 quantification?
Ans: If you have done the necessary method refinement work, it could certainly be one. The field still relies heavily on antibody assays that bind to a single site on a protein. Again this question highlights the difference between a targeted approach and a broader discovery or survey-type approach, where one expects to have very little prior knowledge of the proteins in question beyond their sequences recorded in a FASTA file. Targeted proteomics allows you to gain in depth knowledge of your targeted proteins, derive "figures of merit" like linear range, LOQ, LOD, best responding peptides, optimal measurement parameters like CE, etc. and inject standard peptides or proteins for more confident measurement by confirming co-elution with a reference peptide. You won't necessarily start off like this with DDA data, but confident quantification of proteins from DDA data may not be the end goal. Perhaps quantitative results from MS1 scans in DDA data will require a great deal more verification in more controlled targeted experiments.
Q: You mentioned that around 5% of the selected peaks will require manual curation. Is there an automated workflow that identifies that the peptides that require manual curation?
Ans: Not yet. Something we will continue to work on. Though, Skyline does have visual displays that can make this kind of review go a lot more quickly than the same task in other targeted proteomics tools. It is still pretty unwieldy, though, if your aim is to process all 20,000 or so spectrum matched peptides in a modern DDA workflow. Narrow the field and start with a hypothesis to validate. Many have felt the work at this stage is worth it. Probably equally many have balked at doing it for tens of thousands or peptides.
Q: Is there a possibility that Skyline will allow recalibration of individual runs? If the mass error is consistently out for certain samples, peak integration will miss the side of the peak in profile data. I could decrease the resolution, however that introduces more noise.
Ans: Great question. Certainly something to consider for a future release.
Q: What are IDs?
Ans: (Birgit) ID's are MS/MS (tandem mass spectrometry) matches by a search engine for the identification of a peptide. This information is initially imported when building the spectral libraries from the database searches.
Q: How do we smooth out peaks a bit more in Skyline?
Ans: (Brian) They are what they are - this is what Skyline has to work with. Improved chromatography is the answer.
Q: Will you be adding Peaks to spectral search?
Ans: Skyline already supports pepXML and mzXML exported from Peaks spectrum matching software.
Q: Can Skyline handle pepXML file that contains results from multiple raw spectral files? Or does it have to be one pepXML per RAW file?
Ans: Yes, Skyline can handle pepXML files with search results from multiple mass spec runs. It does require spectrum files for each of the runs to be present, usually in the mzXML format, but the initial pepXML file specified to Skyline can be a single pepXML file for many mzXML files.
Q: What are the optimal Skyline options/settings that need to be done prior to importing the peptide search -- noting that some of this will depend on my my experimental conditions (sample prep/ MS intrument etc)?
Ans: Exactly. Very dependent on your experiment. The "Import DDA Peptide Search" option on the Start page does attempt to show a limited set of options that are most important for getting things set up correctly. Focus on those first.
Q: How important it is to have high resolution instruments such as Orbitrap/ Q-TOF to do relative quantification of proteins in the discovery stage using Skyline software? Why don't Sklyline RT adjustments depend heavily on ID using MS/MS? If the ID is based on only MS/MS with high confidence, what difference does it make even if m/z was not measured accurately to low ppm?
Ans: First, the higher the resolution in your MS1 scans (from which you are extracting chromatograms) the more selective your chromatograms will be. That is, the more likely that you have measured your analyte of interest without including interference. Taken quantitatively over a large number of analytes, this means your measurements will contain less error, and they will experience a wider dynamic range. The same can be said for SRM versus MS1 quantification on even the highest resolution instruments.
Skyline RT alignment for runs with MS/MS IDs does depend entirely on performing a linear regression on the retention times of the MS/MS spectra matched to peptides. In the case where there are multiple identifications for the same peptide, the first is always used as the retention time for the peptide, as it is assumed this will yield less variance, given dynamic exclusion and the possibility of identifying MS/MS taken at the end of a gradient.
Again, you may be able to confidently state that your peptide was eluting at a given time based on the MS/MS ID, but that does not at all mean that you can get a precise quantitative measurement based on a chromatogram extracted from MS1 scans. There are many new potential sources of error going from a confident MS/MS ID to the peak area for a chromatogram, not least of which is interference from molecules other than the one you are trying to measure. By using low resolution MS1 scans you increase the impact of this interference on your measurements on the global scale.
You may still be able to get good measurements in a sample or part of your gradient with low complexity, but as complexity increases, the precision and dynamic range of your measurements will decrease with the selectivity of your chosen method.
Q: What types of questions can be answered by doing MS1 filtering? Aacetylation, for instance, would one need to know the acteylation sites (and occupancy) for prediced tryptic peptides, and then monitor these by MS1 filtering? Why would this be better than designing an SRM experiment?
Ans: A data independent method has the benefit of requiring less up front method developement and being able to query a larger number of analytes without necessarily requiring a specific hypothesis before measurements can be made. Results may be less precise and ultimately less conclusive, but these methods can help bridge what was once a yawning chasm between discovery and targeted proteomics.