Dealing with huge sanple set in Skyline

Dealing with huge sanple set in Skyline carmen.gonzaleztejedo  2021-10-17

Dear Skyline Support Team,
We have just finished a PRM experiment comparing different treatments and time points, in total, 396 samples.
I have already uploaded all the files in Skyline and now, as I usually do, I would like to confirm the peak picking by manually checking all the peaks (correct peaks at expected RT and peak boundaries).
However, as I have 90 target peptides and 396 samples that can be endless. Is there any tip or suggestions from your side to deal with such a huge set of targets and samples?
Many thanks in advance.
Kind regards,

Nick Shulman responded:  2021-10-18

One way to improve peak picking would be to train an mProphet peak scoring model, and that will cause Skyline to look at more features when picking peaks, and would give you a false discovery rate.
Here is the tutorial for advanced peak picking models:

The problem with doing this with PRM data is that the mass spectrometer really needs to collect data for decoy peptides. There is no way to train a model if all of the data was collected for peptides that exist-- the mass spectrometer needs to have also collected data for some other random precursors.

You could collect a few more runs' worth of data with targets and decoys, and then train a peak picking model, and then use that model on the data that you already have.

Other than that, I do not know of a way to make your life easier, but someone else on this support board might have some good ideas.

If you would like, you could send us your dataset. We are always looking for examples of cases where Skyline does a less than perfect job as we look for ways to improve our peak picking in the future.

In Skyline you can use the menu item:
File > Share
to create a .zip file containing your Skyline document and supporting files including extracted chromatograms.

If that .zip file is less than 50MB you can attach it to this support request.
Otherwise, you can upload it here:

-- Nick
Juan C. Rojas E. responded:  2021-10-19
Hi Carmen,

It seems a lot of fun is coming ahead! Besides the peak picking models Nick suggested:

1) Retention time predictor. Did you have iRT spiked peptides or could you build a decent retention time predictor from your target peptides? Comparing differences between expected and actual retention times (both data accessible from reports) can help to quickly identify incorrect peak picking.
- Skyline team, is there a way of visualizing this time difference in the reports for each peptide in each sample? The only way I have found to visualize this on Skyline is the "Score to Run" regression, but this is more of a summarized difference, right?

2) Using the report windows to visualize idotp and dotp scores and sorting them from min to max might allow you to detect score deviations induced by bad peak picking.

(if chromatographic stability is great this could also be helpful)

3) Import and validate only a representative set of samples (e.g. pool samples). After that has been corrected import all other with restrictive RT windows; ex: "Use scans within 2 minutes of predicted RT" where it would take curated set of samples as the reference for selecting the RT extraction windows.

4) Use the results of a revised sample to "copy-paste" its peak boundaries to all other samples.
- From a file with all imported results, export peak boundaries
- Copy peak boundaries for each peptide from the "reference" sample and paste them to all other samples.
    - Either slowly and painfully with Excel or with a script from your programming language of preference (attached is a script for how I do this in R).
- Import the new edited peak boundaries to force the same integration windows in all samples

However, I think recently Kaipo mentioned the wonderful news this sort of approach will be implemented more elegantly in a new Skyline-daily release.

Maybe many of these ideas you have tested, but I hope at least one of them is useful and I wish you the best with the data analysis!
carmen.gonzaleztejedo responded:  2021-10-19
Title: Dealing with huge sample set in Skyline
Dear Nick and Juan,

Many thanks for all your suggestions, I really appreciate your help.

I don't think acquiring more data with decoy peptides is possible at the moment, but I will try some of the suggestions that Juan mentioned as, for instance, I do have iRT peptides in all my runs.

I will let you know how it goes.

Many thanks and regards,

roman sakson responded:  2021-10-23
Hi Carmen,

I hope that you have a lot of monitor space at your disposal...

I agree with Juan, showing Skyline just a few files can improve peak picking before you load in the rest. "Range Best Retention Time" for a given peptide from the reports would give you the maximum difference in RT across all your 396 files. Normally I decide on what is acceptable for me it terms of RT deviation for a given LC system and trust the rest with smaller deviations, if I cannnot check every window manually. I am attaching a skyr report template for the document grid that you might find useful.

Do you have isotopically labeled peptides in your experiment?

Good luck,