More proteins shown than library in SWATH analysis

support
More proteins shown than library in SWATH analysis heyang  2021-06-16
 

Hi Skyline team,

I just started to learn using Skyline to process SWATH data acquired by Sciex 6600. Library was created by multiple DDA runs searched by ProteinPilot software. Following your instruction, input FASTA files and spectra library. More pep/Proteins than number in library are shown in Skyline. How to explain this?

thanks,
Heyi

 
 
Nick Shulman responded:  2021-06-16
Can you send us your Skyline document?
In Skyline you can use the menu item:
File > Share
to create a .zip file containing your Skyline document and supporting files including extracted chromatograms.

If that .zip file is less than 50MB you can attach it to this support request. Otherwise, you can upload it here:
https://skyline.ms/files.url

One thing that can cause the peptide numbers in your Skyline document to be higher than the numbers in your library is peptides which appear in more than one protein. You can use the menu item "Refine > Remove Repeated Peptides" to remove extra copies of peptides which appear more than once, leaving only the first instance of each peptide. That might bring your peptide numbers down closer to what is in your library.

It is also possible that your settings are telling Skyline to give you all of the Tryptic peptides in each protein, instead of only those peptides which appear in the library. Take a look at the "Pick peptides matching" setting at "Settings > Peptide Settings > Library". If that setting value is "Library" or "Library and Filter" then you will only be given peptides which can be found in the library. However, if the setting value is "Library or Filter" or "Filter", then you will have peptides in your document which are not necessarily found in the library.

-- Nick
 
heyang responded:  2021-06-16
Thank you Nick and I uploaded my file.

I also checked to remove repeated peptides, no change. Also peptide setting is Pick peptides from library.

Thanks,
Heyi
 
Nick Shulman responded:  2021-06-16
Thank you for sending your Skyline document.

Your Skyline document contains 6874 regular peptides and 5532 decoy peptides.

Decoy peptides are fake peptides with randomized sequences that get added to the end of the document when you use the menu item "Refine > Add Decoys". Decoy peptides are useful when you are training an mProphet peak scoring model. You can learn more about peak scoring models here:
https://skyline.ms/wiki/home/software/Skyline/page.view?name=tutorial_peak_picking

In Skyline, structural modifications such as Oxidized Methionine or Deamidation on a peptide sequence get treated as separate peptides in terms of the peptide count number displayed in the status bar. So, of those 6874 non-decoy peptides, there are actually only 5148 unique unmodified peptide sequences, which is the same as the number in your library.

At "Settings > Peptide Settings > Modifications" the variable modifications "Oxidation (M)" and "Deamidation (NQ)" both have a checkmark next to them. This means that Skyline is going to give you all possible permutations of these modifications where they can be applied. This might be responsible for the higher peptide count that you are seeing. If you want to learn more about explicit and implicit modifications this webinar might be helpful:
http://skyline.ms/webinar10.url

Hope this helps. I'm not sure how many peptides were you expecting to see in your Skyline document. I am guessing that the difference has something to do with either decoy peptides or the permutations of variable modifications.

-- Nick
 
heyang responded:  2021-06-16
Thank you, Nick for explanation. However, 1241 proteins , as you see, were shown in Skyline. But only ~950 proteins were ided on 95% confidence level based on ProteinPilot. Why is so different? Also how Skyline control confidence level for these 1241 proteins?

Thanks,
Heyi
 
Brendan MacLean responded:  2021-06-16
It is almost always necessary to choose between "Remove repeated peptides" and "Remove duplicate peptides". The former ensures each peptide appears only once in the document, and the latter that your document contains only unique peptides. Then you may need to use Refile > Remove Empty Proteins to get rid of proteins without any peptides left. Or you can do this all in a single step using Refine > Advanced.

I suspect this is at least part of your problem.

--Brendan
 
heyang responded:  2021-06-17
Thank you, Brendan, for your response. After I "Remove repeated peptides" and "Remove duplicate peptides", looks the number of peptide and protein are no change. Also In Refine, for minimum transitions per precursor, it is blank when I first open it and I put down a number, such as 6, then click ok. Then I open it again, it shows blank again. Is this normal? Also for dot p and idot p, what are smallest values I can use to control real match? Any other suggestions for SWATH data quality control?

Heyi
 
Brendan MacLean responded:  2021-06-17
Hi Heyi,
Sorry for the confusion. I just downloaded the .sky.zip file you uploaded and had a closer look. It appears that your spectral library .blib file was built yesterday (Wed Jun 16 10:38:09 2021), but I find it a bit confusing. I see that it contains search results from 52 files, but Skyline doesn't seem to know the score type for those files and lists all scores as zero.

I think we are going to need to back up and have a look at how your spectral library got built and why it contains no useful scoring information.

I assume it was built from ProteinPilot .group files? Can you please run the group2xml.exe converter provided with Protein Pilot, zip the resulting .group.xml files and upload that to the same location you used for your .sky.zip file? We may only need one .group.xml file to begin taking a closer look at why we are not seeing the expected score type "PROTEIN PILOT CONFIDENCE" but instead only "UNKNOWN".

Thanks for your patience and help in isolating this issue.

--Brendan
 
heyang responded:  2021-06-17
Hi Brendan,

Yes, library was built by ProteinPilot. I uploaded the XML library, named 20210325 MB VB library C2, to you.

Heyi
 
Kaipo responded:  2021-07-28
Hi Heyi,

The score-related data is missing from the blib file you uploaded because File > Share with "Minimize libraries" generates a new blib file that does not preserve that information (if "Store everything" is selected instead then the library will be zipped as-is). This is probably a bug.

Returning to your original question, Skyline uses the "confidence" attribute of the "MATCH" elements in ProteinPilot group.xml files to decide what to include when building a spectral library. For example, if you build a library with a cut-off score of 0.95 (the default), any spectrum in the group.xml file without a match of 0.95 confidence or higher will be excluded.

Thanks,
Kaipo