Adding library peptides to document giving impossibly large number of proteins Liyan Chen  2022-05-24 02:49
 

Hi Skyline developers,

We run DIA on plasma and serum samples in our lab. We have an in-house library generated from fractionated DDA data that gets updated several times a year when we have a new cohort. This has been done on Skyline 19.1 for past two years. The library searches were done on Proteome Discover 2.3 or 2.4.

We are now setting up Skyline 21.2.0.425 on a server and we also have a new cohort to add on to the library. We know that the library PD search on each of our cohorts has around mid 1000+ proteins, and the concatenated library would have 2000+ proteins at most. However one of the library files is giving me an additional 5000 proteins when I add it to the document. This happens whether I append .blib files or select multiple libraries under peptide settings.

Could I send you the problematic library file to take a look?

 
 
Nick Shulman responded:  2022-05-24 06:29
Which FASTA file are you using?

Skyline uses the spectral library to decide which peptides to add to your document. Skyline uses the FASTA file to decide how to map the peptides to proteins. If a peptide can be found in multiple proteins, that peptide gets duplicated across all of the proteins in which it appears.
If you want to know the number of unique peptides in your Skyline document you can use the menu item:
Refine > Remove Repeated Peptides
This will leave you with only the first occurrence of each duplicated peptide.

Yes, you can send us your files and we'd be happy to take a look at what is going on. Files which are less than 50MB can be attached to this support request. You can upload larger files here:
https://skyline.ms/files.url

It would probably also be helpful if you could send us your Skyline document. In Skyline, you can use the menu item:
File > Share
to create a .zip file containing your Skyline document and supporting files.
-- Nick
 
Liyan Chen responded:  2022-05-25 02:20
Hi Nick,

My background FASTA is human canonical sequences. I have checked that there are no duplicated peptides and the settings on v19 (older computer) and v21 are identical. I just used the same Proteome Discover source file for a single build each on v19 and v21 Skyline. The v21 blib gave 3X more peptide precursors and it seems that a lot of poor confidence spectra (for example having only 1 fragment ion) are being added to the library.

I've attached some screenshots, upload of the large files are ongoing.
The blib file created on v21 is "NAFLD_troubleshoot.blib"
The skyline document with excessive proteins and peptides is "blib troubleshoot excessive plasma protein.sky.zip"
The PD result file used to create the library is "20210222_NAFLD_combined_medium_consensus.pdResult"
 
Nick Shulman responded:  2022-05-25 07:18
Liyan,

The files that you uploaded "NAFLD_blib_troubleshoot.sky.zip" and "Blib troubleshoot excessive plasma protein ID.sky.zip" appear to be exactly the same as each other. Can you try uploading one or both of them again?
-- Nick
 
Liyan Chen responded:  2022-05-30 22:48
Hi Nick,

Please ignore the duplicate, sorry. I have uploaded an "NAFLD on Skyline v19.sky.zip", this is the library and document on v19 that gives the correct number of peptides.

The PD result file keeps running into error during upload. It is 3 GB in size - is there a limit in the sharing folder? Hope the two Skyline documents built from different versions are enough for your troubleshooting.
 
Nick Shulman responded:  2022-05-31 06:31
I think I would need to see the .pdresult files in order to come up with a theory about why your new Skyline document has more peptides in it that your old document.
There is no real limit to how large of a file you can upload to the Skyline website. We have received files there which were much more than 3GB in size.
-- Nick
 
Liyan Chen responded:  2022-06-01 02:03
Hi Nick,

I have managed to upload the PD result file by zipping it. Please have a look, thanks!
 
Nick Shulman responded:  2022-06-01 07:18
I was hoping that you would send two different .pdresult files.
I was planning on looking at the two different .pdresult and the figuring out what is different.
-- Nick
 
Liyan Chen responded:  2022-06-01 19:53
Oh I get what you mean now. I have just uploaded this other PDresult file that gives the same expected number of peptides on old and new versions of Skyline: 20211216_Liyan_DynamoPlasmaLib_apex_F.pdResult
 
Nick Shulman responded:  2022-06-02 12:02
I don't think Skyline or BiblioSpec is doing anything wrong.
The file "20211216_Liyan_DynamoPlasmaLib_apex_F.pdresult" really does say that about 9000 peptides were confidently identified, and "20210222_NAFLD_combined_medium_consensus.pdresult" has about 39000 peptides.
In Skyline, if you go to "View > Spectral Libraries" you can see all the peptides that were put into the spectral library.
For most of the peptides, you can see Skyline highlighting lines in the spectrum which match the predicted fragment ions. But, for other peptides, such as "AAAIPPIQVTKVHEPPREDAAPTK", Skyline does not think anything in the spectrum matched a predicted fragment ion, so it is not clear what your search engine might have been looking at to decide that this peptide was in that spectrum.

Do you have other tools for looking at your peptide search results? Let us know if there are any peptides which we are including the spectral library which your other peptide search result browsing software thinks should not be there.
-- Nick
 
Liyan Chen responded:  2022-06-03 02:46

I looked up this peptide "AAAIPPIQVTKVHEPPREDAAPTK" with no predicted fragment ions that you mentioned - it does not exist in the "20210222_NAFLD_combined_medium_consensus.pdresult" file or the Biblio spec library on built on v19.

I exported all the transitions from the problematic v21 document to csv and found more peptides with no fragment ions just from the first few 50 rows. This other example "cNIQmTq_SOSAMSASVGDR" from KVD17_HUMAN also does not exist in the "...NAFLD..." PD result file, and is also not present in the library on v19 build. I think we have isolated one issue: mystery peptides with no fragments in the library.

I think there is another issue of filter settings not working as expected. I have "Pick peptide matching library" under Peptide Settings>Library and "minimum 3 product ions" from "filtered product ions" under Transition Settings>Library" for my documents. These settings should have prevented the mystery empty peptides from being added to the document right?

For ease of comparison, I'm listing the number of proteins and peptides from the "...DynamoPlasma..." PD result file with normal behaviour and the "...NAFLD..." PD result file that is giving odd behaviour in Skyline 21 (unusual numbers in bold).

On the "20211216_Liyan_DynamoPlasmaLib_apex_F.pdresult", I have:
1883 protein groups, 14342 peptide groups on PD result
17321 peptides on Skyline v19 Bibliospec
16743 peptides on Skyline v21 Bibliospec
1272 proteins, 10858 peptides on Skyline v19 document
1152 proteins, 10347 peptides on Skyline v21 document
This obeys the general trend observed in other cohort samples: small increase in peptides from PD result -> Blib (as modifications are considered separate peptides), and a decrease from Blib -> document due to unique peptide enforcement.

On the "20210222_NAFLD_combined_medium_consensus.pdresult", I have:
1735 protein groups, 15561 peptide groups on PD result
17208 peptides on Skyline v19 Bibliospec
45343 peptides on Skyline v21 Bibliospec
1401 proteins, 13277 peptides on Skyline v19 document
6450 proteins, 20263 peptides on Skyline v21 document

 
Nick Shulman responded:  2022-06-03 05:06
BiblioSpec is definitely getting confused by your file "20210222_NAFLD_combined_medium_consensus.pdResult".
Specifically, there is a table called "TargetPsms" and another table called "TargetPeptideGroups", and we expect the "Sequence" column in both of those tables to match, but in that particular .pdresult file, they do not.
So, for that peptide "CNIQMTQSPSAMSASVGDR" in your screenshot, it might be that the peptide was supposed to be "IACVLPVLMDGIQSHPQK".
Or, for the peptide "AAAIPPIQVTKVHEPPREDAAPTK" in the screenshot that I sent you, the peptide was maybe supposed to be "TILDDLRAEDHFSVIDFNQNIR".

I will ask my coworkers and see if anyone else can figure out how to fix this.
-- Nick
 
Nick Shulman responded:  2022-06-03 13:36
Liyan,

Do you have any other .pdresult files where Skyline gives you too many peptides?
It might be that something went when you were doing your peptide search, and invalid data was put into the .pdresult file.
Can you try doing your peptide search again and see whether that produces a .pdresult file which Skyline does not have a problem with?
-- Nick
 
Liyan Chen responded:  2022-06-28 22:52
Hi Nick,

I was away on vacation, sorry for the long wait.

This "20210222_NAFLD_combined_medium_consensus.pdResult" is the only PD result file that gives too many peptides, all others perform normally in Skyline library builds. Re-doing the peptide search fixed the problem, we now have a normal number of peptides in this library.

Thanks a lot for your help!