Questions and Answers
Q: For the stoichiometry experiment, do you express the two full-length proteins in fusion as the heavy standards? How big are they?
Ans: No, our users fused the N-terminal parts from both proteins (around 250 amino acids from each), not the full-length. This resulted in a fusion construct of around 50 kDa molecular weight (the full FASTA is provided in the pdf). The full-length proteins were 50 and 75 kDa, respectively.
Q: Does SProCop consider the number points across peak for each of the fragments as well while testing the symmetry of co-alignment?
Ans: To my knowledge, SProCop is not considering the number of points across the peak, however, this is a metric that can be displayed in Skyline in a customized report, which we routinely do. SProCop also does not check for symmetry of fragment co-alignment, this we do outside of Skyline using Excel or R. Please, feel free to reach out to me if you would like to give it a try.
Q: How much state regulation is there?
Ans: Regulation in the cannabis industry varies from state to state and can be a challenge to keep up with. What is consistent, however, is sample tracking. From the time the lab picks up that sample to the time it is used by the instruments and the remainder destroyed, you and the state have to know exactly where that sample is. The states can also randomly inspect any lab approved to operate -- and may -- with little notice, add new pesticides or metals or residual solvents that the lab is required to monitor.
Q: Do you need Part 11 compliance (audit trail)?
Ans: Cannabis laws are evolving rapidly and it's tough to keep up. It's safe to assume that they will, at some point, require it. At this point, I don't know of a state that does, but I've only built labs in Maryland, California, and Pennsylvania.
Q: Is PNNL preprocessor tool freely avaliable?
Ans: Yes, you can download the free software from https://omics.pnl.gov/software/pnnl-preprocessor. Also, if you google "pnnl preprocessor" it will be the first hit in the list.
Q: Are there plans to incorporate the PNNL preprocessor into the skyline software?
Ans: We are currently executing the PNNL PreProcessor in an automated workflow but as a separate and initial step (it is very easy to use and have both GUI and command-line access). The output pre-processed IMS raw files are the ones used as input for Skyline and this is working well. At the moment there is no plan to have a closer integration in Skyline but it could be possible.
Q: Does Skyline have IMS CCS values?
Ans: Skyline has extensive support for using CCS and also for raw arrival times. In the case of Agilent raw IMS files, these must be previously CCS-calibrated based on the single-field method in a quick procedure using the Agilent IM-MS Browser.
Q: How many peptides per a protein were used in your study?
Ans: For the targeted PRM experiments, 2-3 peptides / proteins were selected.
Q: In two of the volcano plots, it seems a majority of the proteins showed increased expression although many of them are not statistically significant. Can you comment on it?
Ans: To quickly clarify, the volcano plots indicate changes in interaction abundance. That's correct. The expansion of polyQ causes mostly increases in interaction abundance. We usually observe that measurement of interaction abundances have greater variance than proteome abundance measurements. The scale of our experiments is an important consideration as the IP study requires more mouse tissue than proteome analysis. Therefore, we use t-tests to find the most reproducible differential interactions for downstream functional analysis.
Q: Are spectral libraries transferrable between labs or are they too system dependent?
Ans: They are often transferred between labs and there are also sources like NIST, theGPM.org, and PeptideAtlas which make them publicly available for peptides. There other sources for small molecules.
Q: Are RTs with the RT calculator predictive or only useful for known lipids?
Ans: They require an initial empirical measurement. Those measurements get stored as normalized retention times which can later be calibrated to any system with similar chromatography.
Q: Does skyline use metlin and other standard databases and how many MSMS spectra are empirical vs in silico? If Metlin is not currently included, could it be added by the user, and how?
Ans: Skyline small molecule library support currently runs primarily through the standard .msp format. If your library can be expressed in that format then chances are Skyline supports it or can easily be made to support it. We are finding small variants within the standard format, which we are fixing as they are reported. If you find that Skyline does not support what you need, please post a request to the Skyline support board and we will work with you to make sure it is supported.
Q: Will the lipid analysis and LipidCreator also work without ion mobility data but data acquired using a Thermo Q Exactive?
Ans: Yes, the Baker lab is adding empirical measured CCS values for lipids, but LipidCreator is a good starting point for a wide variety of systems without IMS.
Q: Are the SN1 and SN2 lipids separated using LC or IMS? If LC, how does that work? You showed retention time so I just wanted to clarify you were not referring to IMS.
Ans: The example was an LC separation, but this is specific to lysophospholipids. In some cases, they also separate in the IMS dimension. Lysophospholipid standards can be used to determine the order of the peaks. A good paper to reference is Kyle, J.E. et. al. Analyst 2016, 141.
Q: How much time do you need to analyze a lipidomic data set, you never saw before, in-depth?
Ans: This depends on many factors such as the complexity of the matrix and sample type, the instrument used, data analysis method, your familiarity with lipid data, etc. For me, building libraries initially took a long time as I was learning about lipids and getting to know the software. It takes a while to build a library from scratch with a new sample type, at least a month, but the time goes way down after the library has been built.
Q: You mentioned using standards and you also mentioned using LipidCreator for library generation. Do you use both? If so, how well do you find they agree?
Ans: We haven't directly compared LipidCreator libraries to standards, but they generally do agree with endogenous lipids. We do use LipidCreator libraries initially, but the libraries that we plan to publish are generated from our data within Skyline.
Q: Can you share where you purchased your lipids for the iRT?
Ans: Our current iRT calculators use endogenous lipid landmarks, but we are planning to validate and utilize the UltimateSPLASH ONE Lipidomix Mixture from Avanti Polar Lipids in the future.
Q: Is the library from Erin Baker's lab available for people to use?
Ans: They are not currently available but they will be made publically available on Panorama as soon as we publish.
Q: Do you think FAIMS could be used to enhance specificity?
Ans: Yes, I think FAIMS would be a great way to separate isobaric species and provide higher selectivity.
Q: How many points do you typically collect over the chromatographic peaks using your staggered windows?
Ans: The system I am using is a Dionex HPLC at 300 nL/min, in-house packed 2.4um ReproSil-Pur ~25 cm C18 column. Data were acquired on an HFX. Using this system, I typically see 9-12 data points across a peak.
Q: Knowing the (limited number of) histone peptide sequences, do you need to cover the full 400-1200 m/z in DIA or could you reduce the acquired mass range to a much narrower range, therefore reducing again the DIA isolation window size?
Ans: The smallest histone peptide we are interested in analyzing is 300.2156 m/z and the largest 1080.1068 m/z. If we were to use a narrower m/z range, then we would not have any information on these peptides. If an assay required a smaller subset of histone peptides, then we would definitely adjust the m/z range to only include the target peptides.
Q: Are there problems distinguishing trimethyl Lys and acetyl Lys?
Ans: It can be difficult to discriminate between acetyl and trimethylated peptides without any prior information. But we have not had too much difficulty since we use a spectral library that contains ms2 spectra for these peptides.
Q: How much manual integration is necessary for a typical run?
Ans: We performed manual validation of all the peptides with the use of heavy labeled synthetic standards. But with defined iRT values, manual integration should be minimal.
Q: Will SureQuant be available on the older Qe models (Plus, HF, HF-X) at some point? If not, why?
Ans: SureQuant will not be available on QE systems. The method has only been natively implemented on Orbitrap-based instruments operated with TNG software (Tribrid and Exploris families), as being built from the scan events and filters embedded in the Method Editor
Q: In the MS method do you suggest using the multiplexing mode (MSX) to capture both the light and heavy peptide for more accurate quantitation or do you capture/analyze them separately?
Ans: In the current implementation of the SureQuant method, the heavy and light peptides are measured separately, in distinct MS/MS spectra. A variant of this would be the multiplexing mode, which would require some adjustment in the functionalities of the MSMS acquisition for optimal performance. This variant might be explored in the future.
Q: Can one use custom/user-defined peptides for SureQuant?
Ans: Yes. While the deployment of SureQuant assays for commercial kits can be facilitated by the provision of preset methods embedded in the instrument control software (e.g., PQ500 kit from Biognsoys and SureQuant AKT/mTOR kit from Thermo Fisher Scientific), a user can develop his own assay for a custom panel of peptides. The development of such user-defined SureQuant assays is facilitated by the method templates provided.
Q: Have you thought about using the IT for the watch section? To make it through more precursors - to trigger the higher resolution SureQuant scans?
Ans: Yes. This is a variant that we are planning to explore in the future for Tribrid instruments.
Q: Can SureQuant be modified with iAPI 2.0?
Ans:I don't think there is something preventing a user from building his own sort of SureQuant method with iAPI on mass spectrometers enabling native SureQuant acquisition.
Q: Could any Skyline experts with experience of using QE HF machine provide us a model PRM (scheduled) method with detailed parameter setting?
Ans: Please post this request to the Skyline support board. https://skyline.ms/support.url (or Help > Support in Skyline)
Q: When will Skyline 20.2 be released?
Ans: Later this summer, but you can get new features early with Skyline-daily. Help us beta test! (https://skyline.ms/daily.url)
Q: It is nice to have a spectral library prediction tool. I am wondering whether Prosit has any instrument platform requirements?
Ans: We generally do not see a strong dependency with regard to a vendor. The spectrum prediction model was trained on data acquired on an Orbitrap Fusion Lumos but has proven effective on the SCIEX TripleTOF, as shown in our 2019 publication and new data which will be presented in the oral session WOD am 09:30 (by Brendan). Obviously, if fragmentation settings differ (e.g. rolling CE) or the precursor intensity is too low (e.g. TOFs, because of their high sensitivity, have the tendency to generate much more rather low signal-to-noise spectra, were the present intensities doe not necessarily reflect the predicted ones due to ion statistics) the prediction accuracy will be lower. In the MacCoss lab, we have found the Pan Human spectral library produced in the Aebersold lab using TripleTOF data works well for DIA on Thermo Q Exactive data collected in HCD mode.
Q: This is quite the ask: but will prediction software be extended from peptides to small molecules?
Ans: We are unaware of anyone doing that work. It is a little hard to imagine the kind of input used for peptides (an amino acid sequence) for small molecules. Certainly, the bare chemical formula is not enough, since molecules with the same atomic counts can have very different physical-chemical properties from fragmentation to CCS. Perhaps a full chemical structure specified in SMILES or similar could help, but this still seems like it may be a long way off. The reason for this is that in order to predict small molecules, we would likely need at least an order of magnitude more data for training such a system. Such training data is simply not available and from our point of view, won’t be in the foreseeable future (unfortunately). Happily, we can continue to rely on prior empirical measurements and building up libraries of these.
Q: How does this compare to experimental libraries? Is it better?
Ans: Highly similar for spectrum prediction. Although the current implementation lacks the best detection features of a true spectral library which guides the choice of peptides and precursor charge states. Better, in the sense that it allows prediction of a highly likely spectrum for any peptide sequence, which may require ordering a synthetic peptide to produce in an empirical library.
Q: What about non-tryptic peptides? Other proteases or even endogenous peptides?
Ans: The original Prosit paper https://doi.org/10.1038/s41592-019-0426-7 has shown that quality for non-tryptic paper is very close to tryptic peptides. We are in the process of releasing a new model trained including a large set of non-tryptic peptides (e.g. HLA class I and II peptides) and already see similar performance as the original model on all classes of peptides.
Q: How good are the fits of the fragmentation predictions to different mass analyzer results?
Ans: The spectrum prediction model was trained on data acquired on an Orbitrap Fusion Lumo but has proven effective on the SCIEX TripleTOF, as shown in our 2019 publication and new data which will be presented in the oral session WOD am 09:30 (by Brendan). Similarly, in the MacCoss lab, we have found the Pan Human spectral library produced in the Aebersold lab using TripleTOF data works well for DIA on Thermo Q Exactive data collected in HCD mode.
Q: The Kuster lab is known for using DMSO in mobile phases, which drastically effects precursor ion (charge state) distribution. In your last slide, you showed optimal precursor prediction...does this include for DMSO?
Ans: The data for precursor ion (charge state) distribution isn’t solely based on data from the Kusterlab but is based on all the data available in ProteomicsDB. As ProteomicsDB is hosting experiments from many different laboratories we do not expect Prosit to be overfitting to DMSO. However, certainly, an important question to keep and mind which requires a more detailed analysis.
Q: For the "consistency" of identification/scoring of peptides, would you generate 2 libraries: one experimental and one from Prosit separately?
Ans: There are certain advantages in mixing libraries, but the answer, unfortunately, depends on the software used. Generally speaking, one has to be careful when mixing libraries especially for DIA data analysis as correct FDR estimation becomes more complex. You can check Brian Searle’s (2020) work on so-called hybrid workflows.
Q: Can Prosit get feedback from Skyline users with regards to the quality of match to further train the prediction algorithm?
Ans: We haven’t implemented any kind of feedback between Skyline and Prosit. As of right now, we can’t easily judge how much “feedback” we would need to train the algorithm for individual users and it would increase the burden on our GPUs to host-specific models for every user.
Q: How good is Prosit at predicting the spectra of peptides containing PTM's such as phosphorylation and acetylation?
Ans: Please watch Mathias Wilhelm’s talk MOD am 10:10 showing the benefits of predicted spectra for the localization of phosphorylation. This can help especially in the case of adjacent phosphorylation events which are often a problem for classical database approaches.
Q: What justifies the claim that Amanda is "better" than any other search engine? Why should we abandon our previous favorite?
Ans: There is not going to be the perfect search engine -- being better than all the others on all data sets. There will always be search engines better on a certain type of data and another one better on different kinds of data. In our experiments we saw that MS Amanda works very well on HCD and EThcD data sets, and also got feedback from users, that they very much like it to analyze phosphorylated data. Just give it a try.
Q: How does Amanda do false discovery?
Ans: The implementation integrated into Skyline will use Percolator for FDR estimation and q-value assignment.
Q: Does MS Amanda work with raw files from all vendors? Does work with MSE data from Waters instruments?
Ans: The integration with Skyline is using ProteoWizard to access raw data files of all types. That said, it is unlikely to work directly for Waters MSE raw data files. For this, you would need to export a deconvoluted spectrum file like an MGF, which we understand is possible with Waters software. Skyline can still extract chromatograms from the raw MSE data, but the spectra searched by MS Amanda must be deconvoluted to look like DDA spectra. This is the same principle described in the presentation for DIA with DIA-Umpire for deconvolution.
Q: Is there a maximum number of modifications that MSAmanda can consider? Can it consider neutral losses from precursor and fragment ions (that are not phosphorylation or sulfation)?
Ans: MS Amanda does not limit the number of modifications that can be considered, however, every additional modification does increase the search space and therefore also the search time. Although, MS Amanda is multi-threaded and takes advantage of all available cores. It definitely considers neutral losses on both, precursor and fragment ions, as soon as they are specified in the unimod database. You can even define your own.
Q: When will the ‘import DDA search’ be available in Skyline-daily?
Ans: We hope by the end of June, but feel more confident in saying by the end of this summer. The work started in September 2019 and remains somewhat exploratory, but we are determined to see it in use by Skyline users soon.
Q: How does the search time compare to Mascot and Sequest in ProteomeDiscoverer? Is there a maximum on the number of MS/MS submitted?
Ans: No, there is no maximum number of spectra that can be processed. In the settings, the number of spectra processed in parallel can be specified and MS Amanda will further distribute these spectra to the available cores to speed up the analysis. Users can therefore easily take influence on the speed if enough computing power is available. Compared to the other search engines in PD, I have not really checked but since MS Amanda 2.0, I would say, it is comparable, although I have to admit that the Mascot search in PD is really super fast. The standalone version is definitely faster than the PD version as the overhead of communication with PD is omitted.
Q: Have you tested MS Amanda+Percolator in Skyline for samples other than cell isolates? I.e. Bacterial mixtures, etc...
Ans: Testing is still very much in "proof of concept" mode and we expect to rely heavily on user testing once we can release it in Skyline-daily.
Q: For DIA, would you consider the integration of Prosit?
Ans: Yes. This is happening more and more in the field. You should review Brendan's ASMS talk WOD am 09:30 which contains other references.
Q: Is there any way to evaluate TOMAHAQ data in Skyline?
Ans: We added support for the extraction of ions from MS3 in the past year. Provided that you are willing to acquire your MS3 in targeted mode (PRM), you could do this in Skyline by using the Transition Settings>Filtering>Special ions to target isobaric fragments, but if you acquire MS3 only through DDA, then you will not acquire enough on any given molecule to form a chromatogram peak, which is a requirement for using Skyline.
Q: Do you have data from QTOF using UNIFI software?
Ans: We have been working on that support with Waters.
Q: Which webinar/tutorials would you recommend we suggest to coworkers who are brand new to Skyline?
Ans: I would start on the Tutorials page (https://skyline.ms/tutorials.url) in the Introductory section. These tutorials use SRM data, which is small and fast for instruction, but they each introductory concepts which will be useful for any Skyline user. Then move from there to the other sections and tutorials of greatest interest. Both the tutorials and webinars contain cross-links to each other to help you find the written tutorials and webinar presentations that present similar material. More broadly, there are even entirely recorded courses you can find through the "Join Us" section on the Skyline main page.
Q:Does Skyline Support Waters, Synapt-G2, IMS data?
Q: Is the MAM workflow to monitor relative abundances of peptide modifications?
Ans: Yes, but also other quality metrics, such as glycosylation, which is monitored by grouping a set of glycopeptides and monitoring their relative abundance to each other from sample to sample over time. A paper by Richard Rodgers is a good place to start reading (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4623056/). We are in discussions with Richard Rogers on getting funding and guidance from his MAM Consortium on continued development in this area.
Q: Are there any plans to have a small molecule library similar to Prosit, or a connection to the Fiehn lib or MONA database?
Ans: Prosit-like fragmentation prediction for small molecules is an extremely challenging problem, as discussed above. However, we are extremely interested in supporting as many existing small molecule library efforts as we possibly can, just as we have for proteomics over the past decade to the point were we now support over 20 sources of peptide library data. Currently, our support runs through the .msp format, which many small molecule sources seem to provide. If you find we do not support what you need, you should post a request to the Skyline support board and let us work with you to support it. Your feedback and example data may enable future research for you and others.
Q: For TMT, can define it as a special ion.
Ans: Yes. Agreed. That will work for PRM, but it won't help for MS1 filtering from DDA data, which is how most people acquire data on TMT labeled samples.