Questions and Answers

Webinar 8

Dear Skyline Users,

Our eighth Skyline Tutorial Webinar took us back to our roots to our very first webinar on DDA. But this time, we looked at how to use group comparisons to find abundant differences in our data set of 30,000 peptide IDs and then explored techniques to refine our list to begin targeted validation. Naturally, this advanced topic prompted many great questions from our live audience ... so here are the answers:

Q: I missed the first 10 min, but I can't seem to find "group comparisons". Do I have to do something to make this show up in the menu bar?

Ans: Please be sure to upgrade to the latest version of Skyline (3.1.0.7382). You can find the version number in Skyline in the About form (Help > About). If you have an earlier version, you will need to upgrade manually from the Skyline web site.

Q: How does Skyline roll up protein intensities from peptides? Does Protein fold-change use average of peptide intensities?

Ans: Yes, Skyline uses the sum of peptide intensities as the protein intensity value. Currently Skyline does not supply this raw value in its reports, but we intend to make this available in the future.

Q: Is there a way to check the quality of the peak integration/ matching globally i.e. before we have filtered the data in terms of differential expression?

Ans: In the case presented in this tutorial, where you are starting with tens of thousands of peptides, I would recommend against further data quality assessment before applying some initial filter. You can make your initial filtering more or less stringent, but with tens of thousands of targets you really have too many to assess and correct each one prior to filtering. You could use various statistics within Skyline to assess peak quality, but for DDA data there is no single automatic value to find peak quality issues.

Q: Does missing the 4-fold change protein in some settings indicate that it is hard to pick up small-fold-change candidates?

Ans: This is by no means a well-designed quantitative assessment of ability to detect candidates at a given fold-change. There are a lot of factors at play in coming up with the set of “significant changes” shown in this tutorial. One is clearly the very low number of replicates (3) for any of the groups we are comparing. This is just barely enough to perform a t-test. A compensatory factor is that these are all technical replicates, without any biological variance and very limited process variance. This means variance is far lower than we would expect in a more complex experiment including multiple subjects. Next, I chose to use adjusted p values, which attempt to control false discovery rate by increasing unadjusted p values, based on multiple-hypothesis tests. This is an important difference between a discovery experiment, like the one in the tutorial, where you start out without any real hypothesis on what you are trying to test, and a targeted experiment where you are able to limit what you target based on a hypothesis. The latter will have greater ability to detect small fold-changes in your targeted peptides.

Q: Does Skyline 3.1 replace Skyline daily?

Ans: No, it does not. No public release will ever replace Skyline-daily. Skyline-daily is the beta release of Skyline, which is always ahead of the Skyline public release. The two are nearly identical just after a public release is made, but then Skyline-daily will continue being update, while public releases will only happen every 6 months or so. If you are not familiar with Skyline-daily, anyone can now request access to it through the Skyline web site. There is a green button for this on the Skyline installation page.

Q: What are the light blue lines in the chromatograms?

Ans: The light blue lines were the aligned peptide IDs from other runs. In the tutorial, they were shown through a right-click on the chromatogram graph, Peptide ID Times > Aligned. This causes Skyline to use the linear equations shown in View > Retention Times > Aligned to take ID times from the runs in which they occur and map them into other runs.

Q: When you do the group comparison, does Skyline take care of the total intensity normalization automatically?

Ans: If you want total intensity normalization in the Skyline group comparisons, you need to specify that in the group comparison settings. Currently, Skyline offers only two options for normalization: ratio to heavy and ratio to global standard. Since this was a label-free experiment, ratio to heavy was not appropriate. I could have defined global standard peptides, by right-click > Standard Type > Normalization, and then use those global standards for normalization in the group comparison. Skyline does not, however, allow normalization by total ion current or median peptide peak area, which have been used in discovery experiments, where the majority of peptides are expected not to be changing. This may have been true of the tutorial data set. We just felt this was a dangerous assumption to make in a tool like Skyline where many of the experiments it processes expect to see change in the majority of targets.

Q: How can you correct for small differences in column loading/ overall sample peptide conc. by taking into account normalization for BPC or TIC integrals for Skyline statistics?

Ans: Skyline does not currently offer this option, though maybe it should. Currently you would correct by normalizing against a global standard or set of global standard peptides, as described above.

Q: About time alignment to transfer MS/MS ID, is this a similar feature as Maxquant "match between run"? In MQ, there are two steps alignment: broadly align and re-calibrate retention time (~20min maximum time window by default), and then a more stringent time window to transfer MS/MS ID (~2min). How is this done in Skyline?

Ans: Yes, the features are similar. Skyline does its retention time alignment only in a single step, based on the retention times of MS/MS spectra that get matched to peptides by whatever peptide search engine you choose to use. Skyline supports 20+ different peptide search pipelines, including MaxQuant Andromeda.

Q: How do make sure you are importing peptides or proteins at specific FDR (1% at peptide or protein)? In my case I would upload search results from PD 2.0.

Ans: Skyline has a field “Cut-off score” that ranges between 1 and 0. In the tutorial, I used 0.99, which corresponds to a 1% FDR with a pipeline that generates a q value score, like PD 2.0 with Percolator. Skyline uses whatever probability metric each of its 20+ supported peptide search pipelines produces, including q value, posterior error probability, expectation value, etc. When the score is a probability of being false (like q value), then Skyline uses (1 – score) to convert all scores to where 1 is best and 0 worst.

Q: What is the main difference between Chorus and Panorama?

Ans: From a purely Skyline-centric view, Chorus is the raw data repository and Panorama is a processed data repository, in a workflow summarized as collect data -> store raw in Chorus -> analyze with Skyline -> store results in Panorama. The end goal of Chorus is to also support various other analyses, like peptide searches and label-free analysis, directly in the cloud. The existing infrastructure for storing and analyzing processed relational data is far more developed in Panorama (running on LabKey Server) than in Chorus, which is more file centric.

Q: How do you get your Skyline-daily to update. I've tried opening and closing many times, but it won't got to version 3.1

Ans: When automatic update fails, you should first try installing the latest version manually over the top of your current version. When that fails, you should consult the broken installation web page on the Skyline web site: https://skyline.gs.washington.edu/labkey/wiki/home/software/Skyline/page.view?name=tip_recover_install

Q: Is the time alignment always linear in Skyline?

Ans: Yes. Skyline always uses linear regression for retention time alignment.

Q: In the group comparisons, you have adjusted p-values. How are you generating the adjusted values?

Ans: Skyline uses the Benjamini-Hochberg algorithm for adjusting p values.

Q: Can you filter you peptide list based on minimal number of IDs per replicate (say 2)?

Ans: This is not possible in Skyline. I am not sure how desirable it would be either. I would expect many peptides to have less than 1 IDs per replicate, i.e. they are identified once in some subset of the replicates, and not at all in the others. I guess I once used identification in all replicates as a quality criterion for choosing peptide peaks from DDA data I imported from PeptideAtlas in order to add retention times to an iRT library, under the assumption that this would this would give me higher confidence in the times being added. I have added an issue for this idea to the issues list: https://skyline.gs.washington.edu/labkey/issues/home/issues/details.view?issueId=420

I can’t promise when it might get implemented.

Q: How exactly singletons (proteins that appear uniquely in one sample and not in the other) are reported? More specifically can they be among those reported as NaN?

Ans: I would expect that groups with only a single measurement, and therefore no standard deviation on which to base variance estimates, would appear as #N/A in the group comparison grid. We are currently seeing NaN for anything where a single replicate has a measured area of zero, which is not ideal. It did not affect this tutorial, but we are planning to follow the example of MSstats and convert these zero values to 1 for the purpose of these group comparisons.