DIA analysis; unable to remove precursors

support
DIA analysis; unable to remove precursors narjis fatima  2020-12-25
 

Hi ,
I have recently done a DIA analysis of whole proteome CLL cells after drug treatments. I could not resolve the issue of precursor removal in my analysis. I have unchecked the include DIA precursor window in transition settings --> filter and added "Start by precursor" in the document grid as recommended by one of the expert here.
I am still getting this error (mentioned below)
I am analysis DIA results for the first time, and therefore I think I am having some issues with the processing of results. Can you let me know how to fix this issue.

I am also observing precursor and product ions both in the target section. I saw you mention some one that either precursor or product ions should be there. Could you let me know if this is the issue, how to fix it ?
Thanks a lot.

** Loading the required statistical software packages in R .....

=======================================
** Reading the data for MSstats.....
** Peptides, that are used in more than one proteins, are removed.
** Truncated peaks are replaced with NA.
Error in SkylinetoMSstatsFormat(raw, removeProtein_with1Feature = TRUE, :
** Please check precursors information. If your experiment is DIA, please remove the precursors. If your experiments is DDA, please check the precursor information.

Can't finish analysis.

 
 
Nick Shulman responded:  2020-12-25
The best way to remove all of the Precursors from your Skyline document is to use the Document Grid.

1. In the Document Grid, choose the report called "Transitions".
2. Right-click on the Fragment Ion column and choose "Filter".
3. Set the filter to:
Starts With
precursor
4. Select all of the rows by clicking the square at the top left of the Document Grid.
5. Use the Red X in the Document Grid's toolbar to remove all of the Precursor transitions from your document.

It is a little annoying that MSstats does not allow you to have precursors and product transitions in the same Skyline document.

If you do not need the advanced statistics that MSstats uses, you can use the Group Comparison feature that is built into Skyline.
You can get to the Group Comparison feature in Skyline by going to:
View > Other Grids > Group Comparison > Add
You can learn more about this feature in the Group Comparison tutorial:
https://skyline.ms/wiki/home/software/Skyline/page.view?name=tutorial_grouped

By the way, when you changed the Transition Settings Filter to not include precursors, that probably removed most of the precursor transitions from your Document, but would have missed any of the Precursors where you have manually chosen the children. Another thing that probably would have worked to remove the precursor transitions from your document after you changed your transition filter settings would have been to go to:
Refine > Advanced
and choose "Auto Select All Transitions".

-- Nick
 
narjis fatima responded:  2021-01-12
Hi Nick,
Thanks for that. I tried removing all the precursors manually and then did the analysis again. I get this error now.
Can you have a look at my DIA_WT file attached here. I am doing skyline analysis for the first time and therefore, having lots of problems in solving the issues.


"C:\Users\nfat4617\Documents\R\R-3.6.1\bin\R.exe" -f "C:\Users\nfat4617\AppData\Local\Apps\2.0\Q8T163LM.9PM\XABHRLCJ.LXH\skyl..tion_e4141a2a22107248_0014.0002_2f1cb11a037aa924\Tools\MSstats\MSStatsGC.r" --slave --args "C:\Users\nfat4617\AppData\Local\Temp\MSstats_Group_Comparison_MSstats_Input.csv" "Drug treatments TP53ko" 1 FALSE FALSE FALSE -1

================================================================

** Loading the required statistical software packages in R .....

=======================================

** Reading the data for MSstats.....

** Peptides, that are used in more than one proteins, are removed.

** Truncated peaks are replaced with NA.

** 10732 features have all NAs or zero intensity values and are removed.

** 925 features have 1 or 2 intensities across runs and are removed.

** 58 proteins, which have only one feature in a protein, are removed among 5510 proteins.

Error in SkylinetoMSstatsFormat(raw, removeProtein_with1Feature = TRUE, :

** Please check annotation for Condition and BioReplicat column. There is missing information.

Can't finish analysis.
 
Nick Shulman responded:  2021-01-12
It sounds like it is telling you that you have not filled in the "Condition" and "BioReplicate" columns for all of the replicates in your document.

You can fill those in by going to:
View > Document Grid
and choose "Replicates" from the Reports dropdown at the top of the Document Grid.

Make sure that each of your replicates has something filled in for "Condition" and "BioReplicate".
 
narjis fatima responded:  2021-01-12
Thanks Nick.

I filled all the conditions with disease as I don't have normal samples.
So, I am doing an analysis of cancer cells with different drug treatments. I made my table like this (attached in my file; skyline error, duplicates) .
Now the error is there are a list of proteins which are duplicates in conditions. Do you know how can I refine these ?
Is this a problem with my table set up or with my data ?

Narjis
 
Nick Shulman responded:  2021-01-12
You cannot have all of your replicates be the same condition. The Group Comparison is comparing values between two different groups.
You are not required to call your two conditions "Healthy" and "Diseased". You can change the annotation definition by going to:
Settings > Document Settings > Annotations > Edit List > Edit
and either type in some different values in the Value List or change the Annotation Type to "Text".
Then, make it so that the Condition of each replicate is the name of its drug treatment.

If you are getting errors about duplicate rows, one thing that you can do is go to:
Refine > Remove Repeated Peptides

This will make it so that your document contains only one copy of each peptide. This ends up removing more than is strictly necessary to get MSstats working, since MSstats can handle duplicate peptides so long as they belong to separate proteins.

If you would like, you could send us your Skyline document.
In Skyline you can use the menu item:
File > Share
to create a zip file containing your Skyline document and supporting files including extracted chromatograms.

If that .zip file is less than 50MB you can attach it to this support request.
Otherwise, you can upload it here:
https://skyline.ms/files.url

-- Nick
 
Nick Shulman responded:  2021-01-12
I see you uploaded a file containing your error message.

Note that each of your replicates need to have a unique pair of "Condition" and "BioReplicate".
If you assign the same Condition and BioReplicate to two replicates in your Skyline document, you will get the error that you are seeing.

Also, if you assign the same "BioReplicate" value to two Replicates, then MSstats will assume you are doing a time course experiment. That is, MSstats will assume that you took two samples from each person, one sample at a time when they were healthy, and one sample from when they were sick. MSstats will do different statistics if it is a time course experiment (because each data point has its own control). If your samples are not intended to be paired with each other in this way, you should make sure that each Replicate in your Skyline document has a different BioReplicate value.
-- Nick
 
narjis fatima responded:  2021-01-13
Thank you so much Nick.
I have removed the peptides and adjusted the replicates. The analysis is finished, but looks like lots of data is missing now. The volcano plot I observed has nothing upregulated or downregulated.
I have attached the word document "Volcanoplot_Skyline_comments_NARJIS" , in which you can have a look at the comments of the immediate window and the volcano plot. I am also unable to open the excel result file, 'dataprocessedData', it is giving me the error file not loaded completely.

I have uploaded my file as a zip file with title, DIA_WT data analysis_NARJIS on 23-12-2020. If you can have a look at that that will be great. This file is before the removal of precursors.
 
Nick Shulman responded:  2021-01-13
I think the reason that you are unable to find proteins that have a significant adjusted p-value is that you are looking at too many proteins at once.

That is, when you ask MSstats to calculate the fold changes for 5000 proteins, instead of you needing to find a T-test where the p-value is less than .05, it almost has to be 5000 times lower than that. (I'm not sure exactly how much lower the p-value really needs to be, but I know that it is adjusted using the Benjamini Hochberg procedure:
https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini%E2%80%93Hochberg_procedure
)

There are a few things that you can do about this.
One thing that you could do is look at the proteins that MSstats said had large fold changes, and then do your entire experiment again and in the second round of this experiment, only look at the fold changes of these proteins. It is very important that you collect all of the data for your experiment from scratch again, since none of the numbers in the second experiment are allowed to have been biased by the results in this first experiment.

A different thing that you could do is read literature about these proteins and remove the proteins from your Skyline document that you would not expected to be affected by these drugs. If you are going this route, it is important that your decision about what to remove is only based on things that you see in other experiments, and has nothing to do with what you see in this experiment.

-- Nick
 
narjis fatima responded:  2021-01-14
Thanks Nick.
I can try following the instructions.
When you say, " It is very important that you collect all of the data for your experiment from scratch again", does it mean do the analysis with all duplicate peptides ? I am confused here .

Also can you refer me any manual that will help with the skyline statistical analysis ?
I have followed these
https://skyline.ms/files/home/software/Skyline/tools/_tool_MSstats_3.13.7/MSstats-SkylineExternalTool-InstallationAndUserGuide-v2.1.6.pdf,
https://www.stat.purdue.edu/~choi67/USHupo2016_short_course_material.pdf
 
Brendan MacLean responded:  2021-01-15
I highly recommend you do the DIA/SWATH tutorial and try doing it with a proteomewide FASTA.

Here is the tutorial:

https://skyline.ms/tutorial_dia_swath.url

And here is a useful webinar that covers the proteomewide analysis:

https://skyline.ms/webinar18.url

Hope these help.

--Brendan