Analyze Pan-Human Library using the human reference proteome fasta file: /home/support

Analyze Pan-Human Library using the human reference proteome fasta file

support

View Request

Analyze Pan-Human Library using the human reference proteome fasta file

dtn074

2021-12-03 11:56

Hi,

I am Duong Nguyen. I am using Skyline for analyzing the Pan-Human dataset (link: https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=1a91e137d235498aa865997e8d923348).
I imported the human reference proteome fasta file (human_proteome_UP000005640 from UniProt). After that I tried "import -> results" for some mzXML files but it takes forever to complete. The memory usage of my computer was >90% and CPU < 5% at that time (please see the file attached). Do you know what happened?

And a bigger question - have you ever been succeeded on using Skyline to analyze ALL Pan-Human files at the same time with a human reference proteome fasta file? Please give me some suggests if yes. That would be my final goal.

Thank you!
Duong

Screen Shot 2021-12-03 at 9.47.19 AM.png

Nick Shulman responded:	2021-12-03 12:23
Duong, When you do "File > Import > Results" in Skyline, there is a dropdown at the bottom of the "Import Results" dialog which asks you how many files to import simultaneously. You will probably have better performance if you choose "One at a time". If Skyline ever tries to use more RAM than you have in your computer, things really will become about a million times slower. The reason for this has to do with the way that garbage collection in Microsoft .Net behaves when it is using the memory swap file. If you tell Skyline to extract chromatograms from too many files at once, then Skyline will probably be able to avoid using more memory than is in the computer. We generally recommend that you extract chromatograms from .mzML files instead of .mzXML, although that probably would not have an impact on the amount of memory that Skyline is consuming. Skyline can also extract chromatograms from the original raw files, which often works better than mzML. If you send us your Skyline document and one or more of your raw files we can give you more advice about what you can do to speed things up. In Skyline you can use the menu item: File > Share to create a .zip file containing your Skyline document and supporting files including extracted chromatograms and spectral libraries. Files which are less than 50MB can be attached to this support request. You can upload larger files here: https://skyline.ms/files.url I cannot tell from your screenshot how many peptides are in your document. If you added all of the theoretical tryptic peptides from the human protein fasta file, that might be too many to reasonably handle at once. You can reduce the number of peptides in your document by changing the settings on the "Filter" and "Library" tabs at "Settings > Peptide Settings". You also might be able to make your Skyline document smaller by changing the settings on the "Filter" and "Library" tabs at "Settings > Transition Settings". By the way, we have made some memory improvements in Skyline-Daily. You might find things work faster if you try this in Skyline-Daily instead of Skyline 21.1. We will release the new version of Skyline (21.2) in a few weeks which will have these memory performance improvements that are currently only available in Skyline-Daily. We have a page which some helpful tips about how to extract chromatograms faster when using Skyline from the commandline: https://skyline.ms/wiki/home/software/Skyline/page.view?name=perf_scaling -- Nick

dtn074 responded:	2021-12-03 15:52
Hi Nick, Thank you so much for answering my question. I overcame the memory overhead by setting to "One at a time" when "Import Results". It works now but is very slow when loading one file at a time, even with a small part of the human fasta file (Now I use the first 250 protein sequences only to test - file attached). I really want to send you my Skyline document but it is very large (please see the photo attached, it is showing component files in the zip). Do you think I was doing something wrong on the document setup? - I did not expect the extracted chromatograms file is 70GB. For the test, I am using 6 mzXML files uploaded to ftp://massive.ucsd.edu/MSV000079593/ Including: ftp://massive.ucsd.edu/MSV000079593//ccms_peak/PEAK/HEBHARDT_J131009_001.mzXML ftp://massive.ucsd.edu/MSV000079593//ccms_peak/PEAK/HEBHARDT_J131009_002.mzXML ftp://massive.ucsd.edu/MSV000079593//ccms_peak/PEAK/HEBHARDT_J131009_003A.mzXML ftp://massive.ucsd.edu/MSV000079593//ccms_peak/PEAK/MFAINI_J131004_001.mzXML ftp://massive.ucsd.edu/MSV000079593//ccms_peak/PEAK/MFAINI_J131004_002.mzXML ftp://massive.ucsd.edu/MSV000079593//ccms_peak/PEAK/MFAINI_J131004_004.mzXML And it's only a small test. For our final goal, I am going to load all Pan-human files with the full human fasta file. Please let me know your thoughts. Best, Duong
Screen Shot 2021-12-03 at 2.58.39 PM.png uniprot-250-sequences.fasta

dtn074 responded:	2021-12-03 16:25
Nick - I uploaded my Skyline document to https://skyline.ms/project/home/support/file%20sharing/begin.view? with the name: iRT-C18 Standard and 250 human.sky Please take a look if you need it. Thanks! Duong

Nick Shulman responded:	2021-12-03 16:28
One thing that you can do to reduce the size of your .skyd ("Skyline chromatogram data") file is to tell Skyline to not extract full-length chromatograms. At the bottom of the form "Settings > Transition Settings > Full Scan" there are some "Retention time filtering" options which tell Skyline to make the chromatograms shorter. Another thing that will reduce the size of the document and the chromatograms would be to have a spectral library so that you are only extracting the transitions that are expected to be there. If you do not have a spectral library you can create one from predicted spectra using the "Prosit" algorithm by going to: Settings > Peptide Settings > Library > Build and choose "Prosit" for the data source. If you go through the steps of the Build Library wizard you will end up with a .blib file containing the predicted spectra of all of the peptides in your document. Then, you can go to: Settings> Transition Settings > Library and tell Skyline to pick the ## most intense product ions. The size of a Skyline document is approximately equal to the number of transitions time the number of Result Files. If you are going to be analyzing a lot of Result Files, you might want to split those Result Files across multiple Skyline documents. If you want to look at stuff across multiple Skyline documents you should upload your Skyline documents to Panorama (panoramaweb.org). If you upload your .sky.zip file to https://skyline.ms/files.url I could probably give you more advice. We usually learn a lot whenever we look at someone's Skyline document which is causing Skyline to be slow and often we discover things that we can fix about Skyline which will make Skyline faster in these scenarios. Also, I do recommend that you install Skyline-Daily from here: https://skyline.ms/project/home/software/Skyline/daily/begin.view? We have made improvements to the amount of RAM that Skyline uses for large documents like yours. -- Nick

Nick Shulman responded:	2021-12-03 17:03
Thanks for uploading your .sky file. Usually it's more helpful if you upload the .sky.zip file instead, since that will include the chromatograms (.skyd file) and any libraries (.blib) or iRT database files (.irtdb). Those .mzXML files are DDA, so you should really be starting from peptide search results. Skyline has built-in peptide search engine called "MSAmanda". You can get to it by doing: File > Import > Peptide Search and then choose "DDA raw" in the "Start From" dropdown box. The Import Peptide Search wizard will guide you through selecting your mzXML files and specifying a FASTA file. Alternatively, if you have already searched the mzXML files with a different peptide search engine, you could use those peptide search result files on the first page of the Peptide Search wizard. You might find the DDA Search for MS1 filtering tutorial helpful which shows you how to use the MSAmanda search engine in Skyline: https://skyline.ms/wiki/home/software/Skyline/page.view?name=tutorial_dda_search or the MS1 full scan filtering tutorial which shows how to use other peptide search results: https://skyline.ms/wiki/home/software/Skyline/page.view?name=tutorial_ms1_filtering -- Nick

MacCoss Lab Software

MacCoss Lab Software

Analyze Pan-Human Library using the human reference proteome fasta file

View Request