SWATH without internal standards?

support
SWATH without internal standards? Matt Padula  2017-04-17 16:24
 
Hello. We were hoping that the recent webinar would answer this question, but we couldn't get any sound during the webinar.

We went into SWATH a little bit blind and didn't realise the need for iRT or some other kind of internal standard. So we did our experiment and have tried to analyse it with Skyline. We see some huge fold changes but only one is considered statistically significant. The samples include a gene deletion compared to a wild-type, so I would have thought that would be considered significant. Maybe my knowledge of statistics is insufficient. We did our samples in technical triplicate and think we have arranged and analysed the data correctly. We have searched through the tutorials and not found a solution.

Is a way to normalise my SWATH data without a spiked in standard to the sample. I have imported my library and data in the appropriate biological and technical replicates into skyline and am getting some great fold changes but no significance in my statistical analysis. Is there a way around this?

Cheers.
 
 
Brendan MacLean responded:  2017-05-07 11:14
Hi Matt,
I am just reading Kahneman's "Thinking Fast, Thinking Slow" and in fact in the middle of a section on how poorly the human mind assesses the sample sizes required to reduce the possibility of error in conclusions to an acceptable level. Definitely, do not discount your statement, "Maybe my knowledge of statistics is insufficient." It really seems that without stringent adherence to statistical methods, all of our minds are prone to believing our results will be reproducible with far too little evidence. Even Kahneman himself says his study of this phenomenon (with Amos Tversky in 1970s) came out of having fallen prey to it in his own work.

Having very little information on your experiment, I suspect your sample sizes are still too small for confident inference. There are actually statistical techniques supported by MSstats which can give you some insight into the sample sizes you would need to confidently assess changes beyond a certain amount, say 2- or 4-fold. (Note that technical replicates do not increase that sample size, but instead give you more precise individual measurements and some useful insight into your process variance. Without adequate normalization, they may even increase your variance due to spreading it over a longer period of time.)

Multiple hypothesis testing adds another dimension of error, and the more things you test in your experiment with no true change, the less sensitive you are making your tests for change, to control for the increased probability of seeing a similar fold-change by random chance.

So, it is actually not that surprising that you could set up an experiment without enough power to statistically confirm changes you feel quite certain of, e.g. "gene deletion compared to a wild-type". Just because you are confident they are truly changing, that doesn't necessarily mean that other things of which you have less prior knowledge that exhibit similar fold-change are not caused by random chance, if you are performing enough test with a small enough sample size.

Without knowing more, my advice is to use your current experiment as "exploratory" and take the results in which you have some increased confidence to a "confirmatory" validation experiment, where you greatly reduce the number of tests you make and ideally increase your sample size. In this next experiment, you should expect some of your prior results not to be reproducible. The more you relax your acceptance criteria from the first experiment, the more you should expect not to reproduce. But, you should be starting with higher "prior probability" than your first experiment and any increase in sample size should improve your error rate in detecting true change.

These are just my own evolving opinions (as noted, influence even by current reading). You might certainly do better with the opinions of a trained statistician. If you are able to see in Skyline fold-change you find convincing, but the statistics don't agree, then it is really the statistics and experimental design that need to be considered closely.

Thanks for posting your question. If you are willing to share more details like the size of your test set (e.g. how many peptides or proteins you targeted) and the number of biological subjects in each group you are testing for fold-change, that may shed new light.

--Brendan
 
Matt Padula responded:  2017-05-15 01:01
Interestingly, I was listening to an old episode of Freakonomics Radio that was all about Kahneman and Tversky yesterday on my drive home from some annual leave (5000km in three weeks). Fascinating story but their work has been appropriated by the economists.

First some details. There are four samples of bacteria (Wild type, acid treated, deletion of a gene and deletion of the gene with acid treatment). Each of these samples was grown in biological triplicate and each of these has 3 or 4 technical replicates. None of the samples had an internal or retention time standard added (which would have fixed the problem in hindsight. I think). We created an ion library using HILIC fractionation which covers 95%+ of the proteome with one peptide and >65% with two or more peptides. We performed SWATH because we had little idea of which proteins were going to change and DIA methods are 'advertised' as being able to return to a dataset with different hypotheses. In saying that, we know that acid treatment should upregulate the urease pathway and we should see fold changes in those proteins.

So, in our opinion the experimental design is sound. Yes, you can always have more biological replicates, but you need to weigh this against the cost of instrument time and the time necessary to run the samples. We don't have the resources to run more samples for this project when we have numerous other projects demanding instrument time.

The other point that needs to be made here is that we are measuring the same thing, a single peptide, in each technical replicate. This measurement is relatively invariant, even with consideration of changes in instrument performance, and if it were not true, then all quantitation of peptides (or any other molecule) by MS is invalid and cannot be used to answer any questions as we rely on the same peptide in different samples ionising with the same efficency. MSstats seems to be considering each peptide separately (which is the right idea) so one would think that if a peptide in three biological replicates of a sample (say WT) that is 2-10 fold different in abundance compared to the three biological replicates of another (say Acid treated), this would be statistically significant and fall outside a confidence interval. We're not measure whether something is occurring by random chance, but that the difference between two values (made up of other values) is significant. Maybe we are setting up our parameters wrong, but we have tried following tutorials and varying parameters both intuitively and randomly, getting the same fold changes but variation in CI (although still not significant).

We simply don't understand how MSstats is deciding on what is statistically significant. Apologies if I am not making enough sense.

-Matt.
 
Brendan MacLean responded:  2017-05-15 08:25
Hi Matt,
In the end, MSstats is assessing how different the means of your measurements are taking into account the standard errors of your sample set around those means. If by some miracle you had achieved exactly the same measurement in all of your samples of both compared types then your standard errors would be zero and any difference in the means would be considered significant no matter how small. Since standard error is divided by your number of "subjects", an infinite number of subjects would also give you a standard error of zero, and any difference would be considered significant.

If you are not so lucky to have either no variance or an infinite sample size, then statistical inference attempts to assess the probability of the data you actually have arising by chance. If that probability is sufficiently low, then you can make an argument for discarding this "null hypothesis".

Confidence intervals are great because they give you more visibility into the standard error than a simple p value, since CIs are just a constant (depending on the desired confidence) times the standard error. A 95% CI, for instance (assuming a normally distributed mean), is 1.96 * stderr. If your CIs are not narrow enough to exclude your null hypothesis, then I think there are really only 2 remedies: 1) less variance in your independent measurements, 2) more independent measurements.

For a more in-depth look at your MSstats settings and advice from actual statisticians, you should try the MSstats support board, which you can reach by clicking the red "Support Board" button on the MSstats tool page:

https://skyline.ms/skyts/home/software/Skyline/tools/details.view?name=MSstats

I will suggest that 3 "subjects" is barely enough to get any insight into the "population variance" and that variance is systematically understated in such small sample sizes. Given limitations on mass spec time, I know Olga Vitek will tell you that you should have favored more independent measurements and given up on technical replicates since technical replicates can give you insight into your technical variance, but they cannot really help you with your low sample number. Your confidence intervals might be 1/3 their current width had you made 9 independent sample measurements. And that might be enough for them to exclude your null hypothesis.

Good luck with your experiment. You will really need a statistician and not me to sign off on any claims you make from this data. But, I hope the discussion was somewhat informative. Thanks for using Skyline.

--Brendan
 
Matt Padula responded:  2017-05-15 14:04
Thanks for the discussion, Brendan. I need to look into MSstats more, but you have given my student plenty to think on and numerous things to write about in the discussion for this part of her thesis.