Peak boundary imputation on large datasets: /home/support

Peak boundary imputation on large datasets

support

View Request

Peak boundary imputation on large datasets

Tomas Vaisar

2025-12-11 09:36

Thank you guys for a great Webinar and a great feature. This addresses a lot of issues I've seen with peak detection/integration over the years.
I am now trying it on my dataset - it is over 700 Astral runs searched and processed through DIANN which I am visualizing in Skyline (for now a subset of ~120(2,000 proteins, 15,000 precursors, 137,000 transitions). I see that the imputation runs really slow (16 core/24 processors, 64 GB RAM Dell desktop). I wonder what would speed things up - more RAM, more CPU, run in background with SkylineRunner (if so what would be the command..). Or just be patient?

Thanks a lot for any advice.

Tomas

Nick Shulman responded:	2025-12-11 09:46
Can you send us your Skyline document? In Skyline you can use the menu item: File > Share to create a .zip file containing your Skyline document and supporting files including extracted chromatograms. You can upload that .zip file here: https://skyline.ms/files.url When you click "OK" on the Peptide Settings dialog after telling Skyline to impute peak boundaries, most of the time is supposed to be spent reading chromatograms from the .skyd file for the peaks whose boundaries have changed so that Skyline can calculate the new peak area. It is always really helpful when people send us large Skyline documents where Skyline seems to be going slower than would be expected. After we see your document we might be able to fix Skyline to improve the performance. -- Nick

Mike MacCoss responded:	2025-12-11 11:04
Hi Tomas, One thing we do is only import a single batch into each skyline experiment at a time. We have a fair amount of experience with projects with >1000 samples and these are always prepared in 96-well plates (80 samples with 16 controls). Each Skyline document imports the DIANN output and a batch ~80-96 samples. Ultimately you will need to perform normalization and batch correction across batches but we do that on the Skyline reports. It is pretty common for us to have 5-15 Skyline documents in a Panorama Folder with a relatively large DIA project. In general, if you can process the entire dataset with DIANN and create the Skyline.speclib file on your computer there shouldn't be a problem importing and analyzing a batch worth of that data in Skyline. Happy to discuss in more detail. Mike

Tomas Vaisar responded:	2025-12-11 11:29
Thanks Mike, I run the DIANN on Hyak so can run all in one go and create the speclib. We work in plates as well but the controls are interspersed with samples. I then move it to windows box and do Skyline there. It would be great if you could advice me on how you normally process such data. This first set like that from Astral... Tomas

Mike MacCoss responded:	2025-12-11 11:47
Hi Tomas, We frequently process our data using a NextFlow workflow. https://nf-skyline-dia-ms.readthedocs.io/en/latest/ It handles the search of the data with DIA-NN, import into Skyline, etc... Not surprisingly we start with this data in PanoramaWeb and end up with Skyline documents back in Panorama. We do have another workflow for taking the results from multiple Skyline documents, performing normalization, and batch correction across documents. https://github.com/uw-maccosslab/nf-dia-batch-correction These are definitely very much things we have been using internally within the lab so they have not been tested as extensively by outside groups. They also require a slightly higher learning curve than using the Windows GUI of Skyline. Cheers, Mike

Mike MacCoss responded:	2025-12-11 11:58
I'll also point out that we have controls on each row of each plate. The final analysis looks at the effect of normalization and batch correction on the performance of the controls. We spend a lot of time looking at the intra and interbatch variance based on different types of normalization and batch correction. https://pmc.ncbi.nlm.nih.gov/articles/PMC11973981/figure/F7/ In general, it is good to have multiple different controls per batch so that you can demonstrate that the appropriate differences between controls are maintained but the variance within a control is minimized. We do have a post-Skyline dashboard we have been working on that will be created automatically as part of our workflows but that isn't ready for broader use yet. -Mike

Tomas Vaisar responded:	2025-12-11 13:15
Thanks Mike, Will definitely look at your workflows. Nick, I just uploaded the zip file. Best, Tomas

MacCoss Lab Software

MacCoss Lab Software

Peak boundary imputation on large datasets

View Request