Observed Proteins depends on how many samples I search

Observed Proteins depends on how many samples I search kmp95  2021-10-08

Dear Skyline team,

I am working with DDA results and have 12 samples that I am trying to compare and see differences between. I have been noticing that my protein list changes depending upon how many samples I choose to search or display in skyline at a time. Is there a reason for this? I find it curious, because if I search my serum sample independently I only observe ~70 proteins while if I search it with all of my other samples I see 200 proteins and I still see strong protein abundances for the other ~130 proteins that I did not originally observe when individually searching these samples. Is there something I am missing here and a correct way I should be grouping these samples if they are not replicates or very similar?

Best Regards,
Karsten Poulsen

Nick Shulman responded:  2021-10-08
Which peptide search engine did you use?

When you do a peptide search in Skyline, it uses a peptide search engine called "MSAmanda" and the search results get combined using a program called "Percolator" to assign a confidence level (q-value) to all of the peptide spectrum matches (PSMs). You typically end up with a spectral library (.blib file) which contains all of the PSMs that were identified with a 1% false discovery rate.

Note that even though each of those peptide spectrum matches is only 1% likely to be incorrect, if you were to create a list of all of the peptides in all of those PSMs, that list of peptides would have much more than a 1% false discovery rate (the peptides with multiple PSMs are more likely to be correct compared to the peptides with only 1 PSM).

There are tools such as "Peptide Prophet" or "Protein Prophet" whose job is to take a list of PSMs from multiple samples and figure out which peptides or proteins can be inferred based on the evidence in those PSMs.

The set of peptides that will be identified in a sample does depend on what other samples were searched at the same time. I believe if you are using a PSM-level false discovery rate, the set of peptides with at least one confident PSM will typically keep growing as you add more and more samples.

-- Nick
kmp95 responded:  2021-10-08
Hi Nick,

Thanks for your quick response!

I use the MS Amanda search engine.

So with this in mind should I be searching (generating .blibs) displaying (generating skyline files) all of my samples at the same time? Or would it be more accurate to individually search and display samples? In most of the tutorials I saw there were a lot of replicates and it wasn't talked about when you have n's of 1 and how to deal with analyzing different samples like in my case.

 I guess what does not make sense to me is that I assumed my blib file should not change, but when I import that into skyline it depends on what other samples I import it with that changes the "proteome" that I see.

Best Regards,
Karsten Poulsen