Cannot replicate group comparison statistical test in R

support
Cannot replicate group comparison statistical test in R kittinun l  2026-01-28 18:42
 

I recently analyzed PRM data in Skyline and used the Group Comparison feature to detect differential abundance between experimental groups. However, when I exported the data and repeated the statistical testing in R, I was unable to reproduce the significance values reported by Skyline. I noticed that Skyline seems to exclude some intensity values from the Group Comparison analysis, but I have been unable to identify the exact filtering criteria or to export a table that reflects which data points were excluded. How can I determine these criteria and obtain an export that matches the data Skyline actually used for the Group Comparison?

 
 
Nick Shulman responded:  2026-01-28 19:00
If your group comparison is using the same normalization method as you find on the "Quantification" tab at "Settings > Peptide Settings", then the values that Skyline will be using for the group comparison will be the values from the "Normalized Area" column if it's a peptide-level group comparison, or "Protein Abundance Strict" if it's a protein level group comparison.

You can also see those values in the group comparison grid if you customize a report and add the "Abundance" column. There is some information about that in the "Showing abundances with the group comparison results" section of the recent Live Reports tutorial:
https://skyline.ms/tutorials/25-1/LiveReports/en/#Showing_abundances_with_the_group_comparison_results

-- Nick
 
kittinun l responded:  2026-01-28 19:39
Thank you, Nick. I can export the data now. However, the exported values do not exactly match what I see in Skyline. In the “Protein Abundance Strict” report, entries are completely missing (which matches the replicate count used), while the “Values in the Group Comparison grid” still show numerical intensities but are marked with an asterisk (the number of asterisks does not match replicate count used). Could you clarify which of these values Skyline actually uses for its calculations?

I also tried running statistical tests on both exported datasets, but I still cannot reproduce the same or even similar statistics to the ones shown within Skyline (I tried both t.test(var.equal = T) and linear model on log2 transformed value). Could you please let me know what statistical model and test Skyline uses internally in the Group Comparison feature (e.g., log transformation, handling of missing values, and multiple-testing correction).
 
kittinun l responded:  2026-01-28 19:56
Sorry, I am able to reproduce the first question now (values in the comparison grid), so the question remains only on statistical analysis.
 
Nick Shulman responded:  2026-01-29 01:45
The values with asterisks are not used in the group comparison. The "Protein Abundance Strict" values are null when the "Protein Abundance" value has an asterisk next to it. If you look at the "Protein Abundance Message" column you can see why the value is not being included. For PRM data, the most common reason for a value to be rejected is that the chromatogram peak was "truncated" which in Skyline can happen if the retention time of the integration boundary is at the edge of the data acquisition window.

There's a bit of a description about how Skyline turns the logarithm of those abundances into a fold change here:
https://skyline.ms/announcements/home/support/thread.view?rowId=43007

The actual code to do the linear regression and T test is here:
https://github.com/ProteoWizard/pwiz/blob/master/pwiz_tools/Shared/Common/DataAnalysis/LinearModel.cs

If you send me your Skyline document and your R code I might be able to tell you why you are getting different answers.
In Skyline you can use the menu item:
File > Share
to create a .zip file containing your Skyline document and supporting files including extracted chromatograms.

Files which are less than 50MB can be attached to these support requests. You can always upload larger files here:
https://skyline.ms/files.url
-- Nick
 
kittinun l responded:  2026-01-29 03:42
Thank you, Nick. I have uploaded the skyline document on https://skyline.ms/files.url
The meta and R code are attached.
Meanwhile, I'll try figuring out Skyline code.

Regards,
Pete
 
Nick Shulman responded:  2026-01-29 04:27
What does the report look like that you are feeding into this R program?
The code seems to be looking for the "MS Level" column which makes me think it's using data exported from the Group Comparison grid, but it also has the "XxxStrict" columns which is not a column that you would typically look at in the Group Comparison grid.

If you want the Protein Abundance values in Skyline to match the values that the Group Comparison uses, you must go to the "Quantification" tab at "Settings > Peptide Settings" and choose "2" for "MS Level".
Skyline only calculates one Protein Abundance value and it uses the settings from the Quantification tab.
The Group Comparison definition has its own setting for which MS Level to use and we recommend choosing "Default" so that it uses the same setting from the Quantification tab. When the Group Comparison is told to use both MS1 and MS2, it gives you separate rows for each MS Level, and the abundance values on those two rows are different.
If you are only interested in the fold changes calculated using the MS2 level, then you should make sure that the group comparison only gives you results for that MS Level since otherwise the adjusted p-values end up being worse because of the multiple testing problem.
-- Nick
 
kittinun l responded:  2026-01-29 04:37
Sorry for the confusion, Nick.
This is the table before export. I applied the MS2 filter because I want to work at the MS2 level for the group comparison, and I exported both protein result types but only used the “strict” intensity in R.
Using these values, I performed the t‑tests/regression and then compared the results with the Skyline groupComparison output. I have already checked the p‑value adjustment method: when I take the raw p‑values from Skyline and adjust them in R, they match the adjusted p‑values reported by Skyline. This indicates that the discrepancy must come from the raw p‑values themselves, suggesting differences in either the underlying models or the intensity data used.
EDIT: the comparison is PR vs. CR
 
Nick Shulman responded:  2026-01-29 05:05
In the Group Comparison grid, don't bother looking at the "Protein Abundance" or "Protein Abundance Strict" columns. There is a different column with the name "Abundance" which is the actual value that is used for the fold change calculation. You can see that column if you choose "Clustered" from the "Reports" dropdown.

The "Protein Abundance Strict" column is what you would use if you wanted to calculate fold changes using numbers from the Document Grid. But, if you wanted to be able to duplicate the results that you are getting from the Group Comparison, you would need to make sure that the MS Level and Normalization Method settings on the Quantification tab in Peptide Settings match what you have in your group comparison.
-- Nick
 
kittinun l responded:  2026-01-29 06:09
Thank you, Nick. After exporting the clustered view, I can eventually reproduce a statistical result. I can now finally rest in peace.

Sincerely,
Pete