Questions and Answers

Webinar 18

Q: We are doing a comparative grade-wise study for biomarker identification. Should we be using DDA or DIA? Also, will different instrument types make a difference in analysis?

Ans: We would recommend DIA as it can produce higher quality data in the end for quantitative experiments. Though, it can also be more work and require learning new technology. So, I don't think we can make a definitive recommendation. And you will need to make your decision based on the resources and expertise in your lab. The Skyline Team will be happy to help you make the most of either approach.

Q: Can DIA peptide detection and related FDR be applied in a global context instead of local? Meaning each target and decoy get to keep only their maximum z-score from all replicates for q-value calculation? (Fig 2 from https://www.nature.com/articles/nmeth.4398/)

Ans: We have this approach implemented on a branch and we have felt very close to releasing it in a Skyline-daily release for a while now. However, our implementation requires more validation. So, it is not yet available. Coming soon, we hope.

Q: Are there any best practice examples for label-based DIA? For example, a combination of SILAC and DIA looking at y-ions only?

Ans: We don't have any example SILAC datasets with DIA. Sorry.

Q: So for MaxQuant search results, you recommend setting the Score Cut off to 0?

Ans: We would recommend a score cutoff of 0 for any search results where the imported set of matches have already been filtered to some know the quality level and where further filtering in Skyline is therefore deemed unnecessary. Not being familiar enough with all possible MaxQuant configurations I cannot say whether this is always or sometimes the case with MaxQuant. You should just be asking yourself, "Do I want absolutely everything in these search results?" If the answer is "Yes", then you can use 0 for the cutoff. This is generally the case with pepXML files downloaded from Spectrum Mill, where you would choose the cutoff before downloading the pepXML. It is likely true of other cases and may be often true with MaxQuant. But, it is really the question and its answer that make the decision.

Q: Are FDR's for library imports calculated at peptide/protein/psm level?

Ans: If you are referring to the cutoff score for library building, then it is not necessarily guaranteed to be an FDR cutoff. The library building cutoff is always applied to the most appropriate probability value provided by the search pipeline. With PeptideProphet, as used in this tutorial, that probability values is a Posterior Error Probability, which indicates the local false-positive rate of the PSM itself. i.e. 0.9 means that all matches like this can be expected to be true positives 9 out of 10 times. It says nothing about the global accumulation of false-positives and therefore cannot be used to estimate FDR. For each search pipeline, you would be best to research what probability value Skyline is using and think about how it maps to the FDR level you seek and then apply the appropriate cutoff score to achieve your desired result. The ETH instructors showed how to do this with Peptide Prophet scores using a tool called MAYU during the DIA/SWATH course at ETH in 2017 and 2018.

Q: How do you use Skyline for DIA when iRT mix was not added to the samples?

Ans: Skyline support using endogenous peptides as iRT standards and we have recently improved the workflow for this. Hopefully, we will be able to get out better instructional material for this soon or include it in a webinar. A good starting point in Skyline is to use the CiRT peptides, a published set of 80+ peptides found to be conserved in Eukaryotes. Skyline will now automatically choose a subset of these found to be well expressed in your data. We have recently had success with this on two 3-organism mix data sets, one for Human, Yeast, E. coli and one for Yarrowia, Arabidopsis, Streptococcus. But, you can also just ask Skyline to find high-quality endogenous markers in your data if you give it replicate runs of your targets extracted in full gradient mode without iRT prediction. It takes a bit more prep, but it does work.

Q: I have tried using an iProphet file to build a library but Skyline did not recognize the file. is it possible to use an iProphet file, to build a library with Spectrast, and then import from one the formats Spectrast can write?

Ans: It is possible to use a .sptxt file output by SpectraST with Skyline. However, we do support iProphet results and have even presented a lot of data analyzed with iProphet. So, we would much prefer that you post a support request and allow us to help you figure out why library building is not working with your files.

Q: Can you explain again why you checked the "Use DIA precursor window for exclusion" box?

Ans: This checkbox will cause Skyline to exclude the m/z range used to isolate the precursor from the set of acceptable fragment ion targets in the resulting MS/MS spectrum. For instance, if the precursor was isolated with the quadrupole set to 600 to 625, then no fragment ion selection would be allowed in the rage 600 to 625. Say the y8+ fragment ion of your peptide fell in this m/z range and had a high intensity in your DDA library spectrum. Skyline would not use it in your targets list. This is because there can be more noise in this range due to unfragmented precursors.

Q: How can we import in a predicted library?

Ans: You can now build on directly in Skyline by defining your targets list and then using the Build button in the Peptide Settings - Library tab. This provides direct access to the Prosit prediction server at TUM and will result in a native Skyline format (BLIB) library complete with spectra and iRT predictions. You can also download a .msp file from the Prosit web interface and use that directly in Skyline by adding it as a library as described for theGPM (.hlf file) in the Skyline Targeted Method Editing tutorial. If you are using a different prediction mechanism, then you will need to understand its output formats and look for a match with the many that Skyline supports. Or you can post a detailed request to the Skyline support board and we will be happy to help more directly.

Q: Can we define our own RT calibration peptides if we don't use iRT peptides?

Ans: Yes. We have made a good deal of progress on making this easier in the past 6 months. A lot of that work made it into 20.1, but there have been improvements since then and Skyline-daily has the best current support for this.

Q: What is centrioded option in a product mass analyzer? Can we use Orbitrap settings if we acquire data on QE-HFx?

Ans: For Thermo and Bruker we recommend using the Centroided option, which will work on vendor centroided spectra. You can even acquire your data so that they do not preserve the full profile spectra, which we have done for a lot of DIA data in the MacCoss lab, because it keeps the data smaller, and we feel confident that we can get what we need from the centroid-only data. With Bruker Q-TOF data this is almost a requirement, because their profile data is so dense. Only with SCIEX would we typically use the "TOF" option in profile mode. This is because it generally works well, and results with Centroided are variable and slower, because the centroiding is done in realtime and the centroiding algorithm that we have from SCIEX is not ideal. In this tutorial, the spectra were pre-centroided to keep file sizes to a minimum for instructional purposes.

Q: I have a direct infusion dataset acquired in DIA mode. What parameters in Skyline I have to take into account to properly process the data when any chromatography is absent?

Ans: Not sure. This is better handled through the support board where you can arrange to provide example data.

Q: Do you plan performing non-linear RT calibration to take better account of peptide elution/separation in the case of non-linear gradients?

Ans: We have recently provided support for LOESS and Logorithmic regression in the iRT settings which we hope will make cases like you describe more accessible to Skyline users.

Q: I noticed that the DIA files and the DDA files were converted to different formats. Are there advantages or disadvantages for either? Also: is it recommended to use MSconvert, or any other software for the conversion? Or any recommended settings?

Ans: The conversion was used to reduce the file sizes by using centroiding. The DDA data were in mzXML format for the TPP search pipeline in use at the Aebersold lab at the time of collection. The DIA files were also originally in mzXML format for consistency. However, mzXML drops more information than mzML and in this case, mzXML has not a way of encoding the isolation window for a spectrum, but only has a precursor m/z, which could be the center of the isolation window, but in a case where you have 64 variable isolation windows the width of the isolation window is also critical information. For this reason, we reconverted the data to mzML format, which allowed us to simply import the isolation scheme from the files as show in the tutorial. This was not possible with the mzXML format. However, in your own work, we recommend that you lean toward using the instrument vendor native data formats and avoid conversion which adds extra time and space requirements to your data processing.

Q: Does removal of duplicate peptides correspond to the removal of shared peptides between proteins?

Ans: Yes. Only peptides which are unique to their proteins were kept in this tutorial, which avoids the need to work out "protein groups" which can make quantitative estimation much more complicated.

Q: Is there a guideline to choose the optimal number of decoys per target based on the size of the reference proteome?

Ans: I don't know of such a guideline. For sample-specific libraries, we generally use 1-to-1 decoy generation. With the Pan Human library, I started using 1-to-4 decoy generation because I saw little decline in detection and quantification results, but gains in processing performance and resulting file sizes.

Q: Does the isolation window import work for only .mzml files or .wiff and .raw too? In general are there any restrictions/benefits to not converting to .mzmL?

Ans: Isolation window import works just as well with instrument vendor native formats, including even diaPASEF data from the timsTOF. We did the conversion to mzML here only for instructional purposes to allow us to get centroided only files where the vendor files contained profile data. We are not recommending you do any conversion to mzML with your own data.

Q: I'm totally amazed with the recently added option of library building with Prosit algorithm. It works great for selected precursors, but what if I do not know the top-intensity precursors? I know I can use the Prego package, for one protein but what if I want to build library for dozens or hundreds of proteins? How can I filter and select high abundance peptides for Prosit?

Ans: The Prosit team is working on this problem. Actually, we shared the Skyline intern Tobi Rohde last summer and he worked on this problem for them. They are also working on extending to CCS prediction for IMS enabled mass spectrometers. When they have usable predictors, we will integrate them into Skyline through the same interface.

Q: What is the worst that could happen, if the library measurements did not contain iRT peptides?

Ans: We don't really recommend trying to use mProphet peak detection without some form of retention time prediction. It does not seem like the other scores are powerful enough to do very well on full gradient chromatograms, not to mention that the resulting files are much larger and the extraction process itself much slower when no source of retention time prediction is available. That doesn't mean that you cannot derive retention time prediction from using endogenous peptides as your iRT landmarks. This works quite well, and it seems that we should probably have a webinar on it because others have made essentially the same request here. So, the answer is that your library measurements do not need to contain injected standards nor any forethought about ensuring iRT landmarks. Skyline can derive this from endogenous peptides, and especially if you have eukaryote samples with high-quality measurements of the CiRT peptides described in the Parker, MCP 2015 paper "Identification of a Set of Conserved Eukaryotic Internal Retention Time Standards for Data-independent Acquisition Mass Spectrometry"

Q: Do you plan to implement an iRT free DIA analysis? Spectronaut automatically uses endogenous peptides to calculate the relative retention times.

Ans: Yes. We may not be quite as automatic yet, but it is already possible, as indicated in prior answers. Probably we will make it more automatic in the future and provide more instructional materials on achieving this.

Q: For mProphet scoring, what does library intensity dot-product mean?

Ans: It is actually the "normalized spectrum contrast angle" between the library spectrum and the measured peak areas. Essentially, it is a measure of how closely the relative intensities of the measured peak areas match the relative intensities found in the library where 1.0 is best and 0.0 is worst.

Q: What approach does Skyline use to removes duplicate peptides? Are these peptides shared among two or more proteins?

Ans: Yes. When you choose to remove "duplicates" you are essentially choosing to keep only peptides which are unique to their proteins.

Q: Would you rather exclude parameters from a scoring mProphet model that contribute little weight percentage in order to have not too many parameters in total or does the overall parameter number not matter statistically?

Ans: We have not performed a study on this and therefore can't really offer insight. It seems logical to remove scores which go against the expectation for the quality they measure, because they are likely due to overfit. e.g. if somehow the coefficient for Mass Error were positive. It doesn't make sense that higher Mass Error should be better. This would strongly indicate the need for a closer look at the data to understand how the model could end up like this, but also it would probably be best to remove this value from the model. Skyline highlights such inverse values in red, and you are welcome to uncheck them and retrain the model.

Q: Could you please explain about green and red buttons beside transitions?

Ans: A red dot beside a transition means that its chromatogram did not show close enough coelution with the dominant peak to be considered part of the peak group. When you turn on "Integrat All" under the Settings menu, then Skyline will show a red dot only when a transition actually contributes no signal to the total peak area, i.e. everything that contributes any signal gets considered part of the peak group. All signal is now always summed, since 19.1, but the red dots show which transitions either do not coelute well (when "Integrate All" is not checked) or do not contribute any signal (when "Integrate All" is checked).

Q: Is there already a tutorial for HDMSE data available?

Ans: There is not a comparable tutorial for HDMSE, but it is on our list to make one. Thanks for your interest.

Q: For the DIA peptide search wizard, is there a way to import all of this from the command line with SkylineCmd?

Ans: Yes. We believe that SkylineCmd offers full support given a template .sky file. Feel free to use the Skyline support board with any specific questions you may have about achieving your desired workflow with SkylineCmd.

Q: How do you identify post-translational modification? Do I need to input a library for different modifications? Or does skyline have a built-in library?

Ans: You would need to acquire a library with the modified forms you are interested in, or use Prosit to predict one. However, the PTMs supported by Prosit are quite limited.

Q: Would you explain again about the difference between using blank document and import DIA peptide search?

Ans: Import DIA Peptide Search takes you through the forms presented step-by-step. You can achieve all of the same results and more by choosing "Blank Document" and then using Settings > Peptide Settings and Transition Settings, File > Import FASTA, Refine > Add Decoys, File > Import > Results, Refine > Reintegrate. In other words, if you really know what you are doing you gain the most power and flexibility by navigating the options directly in the Skyline user interface. If you do not require this flexibility and power, then you should find the Import DIA Peptide Search option much easier. You can also bring up this form through File > Import > Peptide Search after you have chosen Blank Document and possibly made some initial adjustments where are lacking in the Peptide Search forms.

Q: Any suggestions on building the ion library? Do you recommend having each DDA run with the same LC gradient as the DIA run? If we use iRT, can we have quite different LC gradients between DDA (for library) and DIA runs? Is iRT necessary?

Ans: For the best possible results you want your iRT measurements to match experimental DIA measurements as closely as possible. In the MacCoss lab, we often use a "Chromatogram Library" approach pioneered by Brian Searle and described in his Searle, Nature Communications 2018 paper where the DIA search library is acquired on the same column under the same conditions and in close proximity to the experimental data. This approach invariably makes the retention time difference score a very powerful score. The more distant you make your iRT scores from the chromatography you actually use, the less powerful you make your retention time prediction. You can choose to sacrifice as much as you feel comfortable with, but you are making a sacrifice. The iRT concept can support going between systems and gradient lengths, as shown in the original paper, but in doing so, you sacrifice power in the predicitons.

Q: What tool did you use to generate the xml library you imported as your spectral library? If I have a custom peptide list what's the best/easiest way to build my custom spectral library?

Ans: The pepXML imported to build the spectral library was searched with Comet and scored with Peptide Prophet using the Trans Proteomic Pipeline. There is a page on the Skyline website (https://skyline.ms/blib-formats.url) that explains a tabular format which can for entirely custom matching of peptides and molecules to spectra and built into a Skyline spectral library.

Q: Do you have a plan to do an advanced user webinar focusing on command line automation of DIA?

Ans: Not yet, but we'll take this as a request for one. In the meantime webinars 14 and 15 both included command-line scripts and discussion. But, it makes sense to have a webinar in the future which is entirely focused on this topic.

Q: Does Skyline have any built-in way to normalize the sawtooth out? Do you recommend normalizing to the iRT peptides?

Ans: Any chromatograms which appear sawtooth in shape are likely due to a settings issue and you should seek help with your data on the Skyline support board. With any luck, we can help you solve the problem. We do not normally recommend normalizing to the peak areas of the iRT peptides as "global standards" unless your vendor makes a guarantee that quantities in their standards are kept constant. Otherwise, you should consider them only as landmark peptides for retention time alignment with no guarantee of quantitative consistency.

Q: How many confident peptides are finally identified from this spectral library search? Is it mProphet that determines the confident peptide identifications?

Ans: We have not counted that, and it is not a goal of this instructional data set to maximize that number. Rather, we are just showing how to use the software and relying on tool comparison publications to show that Skyline achieve similar numbers to other tools. Yes, it is the mProphet algorithm with assigns q-values and then up to you where you set your q-value cut-off for a "confident identification", though 1% FDR seems to be a common cut-off. We intend to use the decoy distribution and the Storey-Tibshirani algorithm (as mProphet uses) to assign q-value to the default Skyline scores in the future. mProphet can be thought of as a 2-part approach (very similar to the Percolator tool), with part 1: feature coefficient training to produce a model / set of feature weights, and part 2: scoring all peaks with the resulting equation and assigning q-values. Part 2 can be applied to the fixed weights Skyline uses by default, but it has not yet been done. We will in the future, though.

Q: I am just a beginner in this area but my work is on proteomic analysis. My work is based on biomarker identification but I am confused whether I should go for DDA or DIA? I would like to know to get complete knowledge regarding the same, from where I should start learning?

Ans: This is not an easy question to answer. For DDA, you likely want to look at TMT or a MS1 chromatogram extraction tool like MaxQuant or Mascot Distiller (or Skyline). But, you need to decide whether you will use an isobaric tagging method requiring reagents, like TMT, or a label-free method which employs chromatogram peak areas. If the latter, then the label-free MS1 method can be more easily compared with label-free DIA which will quantify with chromatograms extracted from MS/MS. Proof that the DIA MS/MS method produces more selective quantification with fewer missing values has grown over recent years, but most people also still see it as more work to achieve than DDA with MS1 extraction. You will have to make the choice based on your available resources and expertise. There are plenty of available recorded talks and papers addressing the subject. It is not something we can definitively answer here.

Q: What tool do you use to generate pep.xml files? What other formats of files does Skyline take? I use PD for searches, would .msf work?

Ans: Skyline takes all formats we have ever heard of. You can find more information here: https://skyline.ms/blib-formats.url Yes, Skyline can handle both .msf and .pdResult formats. If you find any issues with building a library from your search results, please post to the Skyline support board and we will do our best to get it working for you.

Q: Can I use the iRT peptides for normalization of intensities?

Ans: As explained above, that depends on what sort of guarantees your reagent vendor makes about the quantitative consistency of these peptides. You will have to check with your supplier to answer that question. You should be able to do this using the Skyline normalization to global standards, but you will need to derive whether that normalization is appropriate and beneficial to your quantification. You might simply try an experiment of collecting 10 replicates of your iRT standards spiked into your sample matrix and then extract the standard peaks in Skyline and check the CVs of the resulting peak areas. If the CVs are quite low, then you likely can normalize to them. Also, you can use the Skyline View > Peak Areas > CV Values to assess how normalizing to these peptides impacts the CVs of other peptides, very similar to what was done in the tutorial for median normalization. Again, collect 10 replicates of a consistent sample. Look at the CV distribution without normalization, and then look at it with your proposed normalization. If the distribution does not improve with the proposed normalization (as it did with median normalization in the tutorial) then the normalization cannot be expected to be helpful in your high-value quantitative experiment.

Q: How necessary are the iRT peptides for Skyline analysis? Is using "typical" peptides for retention time alignment reliable or should we plan on always using iRT peptides? I ask because I want to re-analyze samples where I have DDA data collected already without iRT peptides spiked in. I want to re-analyze these with DIA methods, but do I need to re-collect DDA data with iRT peptides?

Ans: This has been answered multiple times above. It is possible to use endogenous peptides as retention time landmarks for iRT in Skyline. You do not need to purchase and spike in a reagent mix designed for this purpose. We have worked on improving the ease with which this can be achieved in Skyline, especially when choosing the CiRT option.

Q: How important is it to construct a peptide library of your organism of choice (through experiments with peptides derived from say, pure bacterial culture) before analysis of other experiments featuring your organism of choice? Do you have a recommended pipeline for deriving such a library?

Ans: It is pretty important to have a library for the workflow presented in this tutorial. Shubert, Nature Protocols 2015 provides a good overview of how to create such a library. Though, we would recommend once you have the necessary DDA search results acquired that you use Skyline to build the actual library rather than SpectraST and tools to produce a tab-separated "Assay Library". Up to the point of running SpectraST, however, the process for creating a peptide search source for your spectral library is well covered in this paper. In general, the Aebersold lab has done a very nice job explaining the process and the options for acquiring DDA and ensuring high-quality search results going into the library.

Q: For LFQ, which one is the best: DDA or DIA?

Ans: DIA... Okay, it is not that simple. We think DIA has been proven to be quantitatively superior due to its higher selectivity and reproducibility, but even some DIA advocates will tell you it is more work to get those improved results. Which you choose will depend on your available resources and expertise. The Skyline Team will try to support you on either one, even if all you choose to use Skyline for is data quality assessment for its leading data visualization capabilities. We leave the choice up to you, though.

Q: Can Skyline input a large library which has a number of rows of transits (exceeding excel limit 1,048,576 rows)?

Ans: Yes. We have successfully searched a target list in Skyline with around 6,000,000 transitions (including both targets and decoys). We are not huge fans of just going bigger and bigger and feel that you will sacrifice quantitative precision and accuracy, decrease quantitative sensitivity or increase you quantitative false discovery rate by pushing for more and more measurements, but we have pushed things pretty high to prove it is possible in Skyline, and certainly, we see no problem with going very large for unrefined replicate measurements and using those to determine what you should use in your measurements where you try to capture actual quantitative biological meaning, as taught during the May Institute at Northeastern (https://youtu.be/2H0YYb5LzYs?t=515). If this is what you are trying to achieve, you likely can go even larger than 6,000,000 transitions.

Q: Can you give some details about iRT peptides/protein? Is spiked in the protein extract? or after trypsin? What quantity?

Ans: In this data set, it is a standard peptide mix spiked in after digestion. There are a number of these mixes available and Skyline provides a full list within its user interface when you choose your iRT standard. For more details, you should consult the reagent vendors for optimal use of there products. You can also use endogenous peptides as indicated above, with hopefully more instruction on that coming soon.

Q: Is it possible to refine ppm and retention time and do a second import with those values (i.e. use the first pass, inspect the two demonstrated graphs, "calibrate" each individual run to this results and then import in a second pass with shifted RT and m/z values according to the first pass)?

Ans: This is not yet possible in Skyline. Though, we are aware of its potential and hope to improve in this direction in the future.

Q: In the identified iRT protein, some peptides have red colors and a few have a green color. Do red-colored peptides indicate low-confident identification?

Ans: I probably forgot to switch to "Integrate All" which would likely make them all green. The meaning of the dots is explained above. In this case, it is an indication of how well the transitions appear to coelute, which is not necessarily indicative of the quality of the identification. It is a feature that was originally designed to aid in transition refinement for SRM to help identify transitions that did not appear to contribute "well-formed" signal to the peak group. Possibly due to interference or just descending below the lower limit of quantification. It is still somewhat informative, but only as a secondary indicator. Changing to Settings > Integrate All will make more things green and then red transitions will be only transitions which contribute no signal within the integrated boundaries.

Q: Regarding the mass error [ppm]: is it this error referring to the "spread" or standard deviation of the mass peaks over a chromatographic peak or something like a weighted average of all mass peaks contributing to a chromatographic peak? And is it in DIA generally higher than in MS1?

Ans: It is referring to [observed m/z] - [calculated m/z]. I am not aware of a study showing that it is on average higher in MS/MS than MS1.

Q: Is your comparison p-value generated using a t-test?

Ans: Yes, the group comparison p-value comes from a paired T-test. Though, it is also an adjusted p-value calculated using Benjamini-Hochberg for the adjustment. We would suggest you consider MSstats or a similar tool external to Skyline which would rely on reports exported from Skyline if you would like to use more sophisticated statistical methods.

Q: How much do I want to look at the adjusted p-value?

Ans: You should consider it at least a first-round cutoff for the changes you are willing to consider as important to your experimental outcome. That is, just because something exhibits a 10-fold change in means between your biological subjects may not mean that change was reliably measured with low-variance consistent measurements. If the variance is high enough or the number of measurements you considered is high enough, it might still not have an adjusted p-value below 0.05. If not, you should certainly avoid making published claims based on the measurement, and you might want to consider whether it is even worth the effort of taking it to the next round of consideration. The more you continue to consider in each round of validation, the more you burden your research effort and the more you burden your statistics with multiple hypothesis testing of things which may not be true positives.

Q: When you extracted Ecoli on Volcano, does not seem to be exactly at ratio 4. Is there a "ratio compression"?

Ans: I think it gets pretty exact for the most confident values, which will tend to have the highest signal and the lowest CVs. There is definitely ratio compression as you go to lower signal and it is the worst for E. coli, because the ratio is 4:1. At the 1 end, targets will begin to reach the lower limit of quantification and even the lower limit of detection when peak area stops responding linearly with a decrease in concentration. So, we actually expect a trumpet-shaped distribution with ratio compression at the lower end, unless your detection statistics somehow stop detecting peptides below their lower limit of quantification. You should be highly suspicious of a tool which does not show this compression but claims to be detecting peptides below the linear dynamic range. These two claims are not possible in combination.

Q: I don't know if I missed it, but how do you tell Skyline -- in the DIA workflow -- to use only unique peptides?

Ans: Yes, you missed it. That is the "Remove duplicates" checkbox that others have asked about above. Once checked, Skyline will use only unique peptides.

Q: What is your computer specs recommendation for large scale analyses from a QE-HF data using a 2-3 hour gradient?

Ans: We are using Dell XPS computers with i7 8-core (16 logical hyperthreaded) and 64 GB of RAM. This computer currently costs around $2000 USD and does very well with many largish data sets. On the extreme end, we also use a Dell PowerEdge with 24-core (48 logical hyperthreaded) and 176 GB of RAM. Though the latter is now a few years old, and we are starting to see others reporting bigger PowerEdge systems like 26-cores and 256 GB RAM. These also cost a lot more, though. Skyline can make good use of the full capabilities of these systems, if you need them. I guess I would recommend getting the former and working with it until you prove to yourself you need the latter.

Q: On using Mobi-DIK to generate spectral library for diaPASEF data: How do you generate the iRT file used in this script "python create_library.py --pasefdata data.d --mqout path/to/maxquant_data --irt irt_file.tsv"?

Ans: Skyline now can do what Mobi-DIK does and much faster with the end result of making all of the Skyline visualizations you saw in this webinar available for diaPASEF data. So, stay tuned for a webinar presenting this soon.

Q: In the volcano plot, can we set colors just for proteins respecting fold change and p-value criteria?

Ans: Yes. Have a close look at the formatting form and let us know if you come up with something you want which it cannot achieve, but it is pretty flexible, we think.

Q: Is it OK that the FASTA file that you imported had already concatenated with the decoy FASTA (reversed)?

Ans: Yes. Perfectly okay, and frequently I like starting with the full FASTA containing matching labeled decoy sequences, because it can give me insight into the false discovery rate in the target set I have chosen. In this case, that FDR was quite low, because I forced only unique peptides and only peptides for proteins that had 2 peptides per protein, and I used a 0.95 PEP in a data set that could support a lower PEP for a 1% FDR. Here is a presentation from the May Institute last year at Northeastern where I explain how to do this: https://youtu.be/2H0YYb5LzYs?t=3083

Q: On which basis do you exclude duplicated peptides (score? , MS1 signal? )?

Ans: We exclude "duplicate" peptides simply on the based of whether they show up in multiple FASTA sequences in the FASTA file. Skyline also has a more sophisticated unique peptide selection mechanism you can use with a background proteome to select peptides unique to a gene, but that is not what is used here. If you instead use remove "repeated" peptides, then Skyline will leave a single copy of every unique peptide in the document, but make sure it is only represented a single time and that single time will be the first time it shows up in the imported proteins, which may not necessarily be useful quantitatively, because you really have no idea which protein contributed how much of the shared peptide you have decided to target. Not sure how the values you suggest would relate to duplication or uniqueness.

Q: We saw differential mass error shifts between the data sets. Is it possible they were collected with different tune files in effect?

Ans: I think you mean that we saw different mass error distributions among the 6 files that were imported. We looked at both the total distribution of mass errors across all 6 files and also the individual distributions across each individual file. I don't think the variance with each run is related to a settings change on the instrument, but rather just variance on the instrument, possibly caused by drift or environment changes.

Q: Do you do anything to exclude peptides that are shared between different proteins in addition to exclude duplicated peptides in the library?

Ans: These are synonyms in our terminology. Remove duplicates = remove peptides shared between multiple proteins = keep only unique peptides.

Q: Can we import the spectral library which is built-in any other tool such as Spectronaut?

Ans: Yes. Skyline supports importing libraries created by other tools through a tabular (comma separated or tab-separated) format. We have had success with both libraries generated for OpenSWATH and Spectronaut.

Q: Please have a webinar on diaPASEF.

Ans: We intend to. Ben being locked out of his lab during the COVID19 shutdown has become a gating factor, since we feel we need a data set explicitly collected for instructional purposes to make it small enough to redistribute easily over the internet. Thanks for your interest!