DDA to DIA (MSE) Peak Peaking Errors

Juan C. Rojas E.

2021-05-06 05:01

Hi All,

I'm following this general workflow from data generated from a Synapt G2-Si using multiple DDA and MSE methods:

Measure same sample in DDA and DIA sequentially
Create Skyline project with IDs from DDA results
Filter bad IDs and correct any integration errors
Import DIA results into same document (after changing appropriate transition settings) using RT allignment (2 min)

What I have observed in multiple datasets is that in many instances Sykline is defining the peak boundaries that fit a good dotp with respect to the spectral library but ignores the fit of the peak boundaries to the precursor ions. This happens when there are closely eluting peptides with similar sequence missing or having extra one or two amino acids (using semi-specific digest results) where the peptide happens to have higher total area for the fragments AUC but there is no detectable signal for the peptide in the precursor scan (low energy scan; Slide 1). The result is in inflated coefficients of variation just because of bad, automatic peak picking (Slide 2, Fig. 1) that can be resolved (Slide 2, Fig. 2) if I check for all peptides in the document.

Unfortunately, I don't work with SWATH-like data, but I would assume this is not an issue that that approach would have (or at least less frequently) since the fragments would not be binned with the m/z range of the "interferring" peptide. Is this an issue specific to MSE or is there something I am overlooking in the data import? If indeed this is an all-ion-fragmentation issue, is there any solution that could be implemented?

I have prepared two sample set examples that can be found in our U of Leipzig Panorama space under seMSE called:

Raw import:
"seMSE_MPDS_NoPeakCorrection..."
"seMSE_EColi_MPDS_NoPeakCorrection..."

Curated results:
"seMSE_MPDS..."
"seMSE_EColi_MPDS..." (work in progress)

I can think of some "tricks" to avoid these issues using the IMS information collected (for the files containing IMS data), but I would like to explore the only m/z dimension solutions that can be applied on the initial data import without having to train an mProphet model (this is pending to test on this data to see if it would help).

As always, thanks for the time and help.
Sincerely,
Juan C.

MSE_peak_boundaries_error.pptx

Nick Shulman responded:	2021-05-06 11:20
Yes, wide window DIA windows make it very hard to distinguish between modified peptides and also peptides with similar sequences, since you do not have the extra information that comes from knowing what the precursor mass was. This problem cannot be fixed by training a peak picking model, since these incorrect matches are not to random things, but are things which are actually in the sample and more similar to the target peptide than what a randomly generated decoy would be. MSe is the ultimate wide window DIA data, where you can be getting interference from precursors that have completely unrelated precursor masses. I suspect that MSe will not be selective enough for you to be able to quantify the peptides that you are interested in, and, even if you do manage to control the CVs by carefully choosing the integration boundaries you will still be quantifying signal from completely unrelated molecules. That said, I think using iRT is the best way to take the information that you have from your DDA runs and use them to choose the peak boundaries in your DIA runs. You do not need to have your DDA runs and your DIA runs in the same Skyline document. We recommend that you use your DDA runs to create a .irtdb file and use that irtdb file in a different Skyline document into which you import your DIA runs. Here are some things to keep in mind: 1. When I try to import a run into your document "seMSE_MPDS_NoPeakCorrection.sky", Skyline brings up a dialog asking me to "Choose RT Prediction Replicates". The reason that Skyline is doing this is that this particular document does not have an iRT retention time predictor, but the setting at "Settings > Peptide Settings > Prediction" says "Use measured retention times when present", so Skyline wants to ask you which replicates to align to. This would not happen if this document was using an iRT database. 2. The document "seMSE_EColi_MPDS_NoPeakCorrection.sky" does have an iRT predictor set up, but Skyline is unable to do a successful linear regression in it, I think because several of the iRT standards are missing from your document. When I try to import results into the document, Skyline warns me that some of the standard peptides are missing. You should go to: Settings > Peptide Settings > Prediction > Calculator Button > Edit Current > Choose Standards and remove the missing peptides from that list. Skyline will tell you that the .blib file cannot be modified, and Skyline will prompt you to create a new .irtdb file. Then, it seems that in order for you to actually use this new .irtdb file, you need to go to: Settings > Peptide Settings > Prediction > Calculator Button > Add and define a new calculator which uses the .irtdb that you just created. Hope this helps, -- Nick

Brian Pratt responded:	2021-05-06 11:30
Hi Juan, I'm not sure where to find your Panorama server, but without looking at your data I'd say it's certainly worth reimporting it using ion mobility filtering. I can take a better look if you'll provide a link to the data. Thanks for using the Skyline support board, Brian Pratt

Brian Pratt responded:	2021-05-06 13:24
Hi Juan, It appears that don't have permissions on your Panorama project, which I suppose is why I couldn't find it. If you want to add me, I'd be happy to have a look at the ion mobility aspects of this. Best, Brian

Juan C. Rojas E. responded:	2021-05-07 01:40
High Nick and Brian, First, thanks for the quick responses. I think I might be misunderstanding something, but I don't think this applies to MSE-type data "since you do not have the extra information that comes from knowing what the [accurate] precursor mass was". Although that might be true for SWATH-like data, the precursor ions are detected in the low energy scans of MSE-like data. Besides the RT offset (based on the prediction calculator), the missing precursor ion transitions in the low energy scan gives away that the wrong peak was selected. Definitely the RT prediction seems to point at right peak regardless of what peak was selected automatically. At this point I think I might have better success by relying only the RT prediction and the low energy scans, while ignoring the fragment EICs which sounds contradictory to the whole DIA benefits...or at least for MSE data due to the increased interference. Once the boundaries are defined, export peak boundaries, reimport the data with the high energy scan, and reimport the peak boundaries. Keeping in mind that the fragment EICs of DDA data have no quantitative value for this experiment, what drawbacks do you envision having the DDA files in the same project would have? I like keeping them there because by checking on the ID times I can check if the spectra is chimeric or not (something I do while I contrast the fragment EICs of the DIA data). With respect to the .irtdb: Indeed that file didn't have a .irtdb because are the samples I used to "find" the "native" iRTs of the EColi_MPDS sample that was analyzed afterwards. So I relied on the allignment to the DDA files. I let Skyline choose automatically what it thought were the best 50 peptides for iRT. In retrospect I should have filtered the document out of low abundant peptides (modified) peptides to prevent Skyline from choosing those. Any other recommendations for automatically choosing "native" peptides as iRTs? Is there a way of checking which peptides were not detected? I did get a warning saying that only 44 of 50 (I think) peptides were detected, but I was still was able to carry on with the analysis. How do you assess if the linear regression fit was good or bad? Which data/results did you import? When I look at the "Score to Run" for the seMSE data it looks good to me (Slide 4), but (disclaimer) I am new to this topic. Brian, I just added you with edito privileges to the seMSE folder. Sorry, I thought all Skyline project managers had full access to Panorama. You will see that, unfortunately, the current implementation of IMS filtering in Skyline will not work for the seMSE files. This is an ongoing project that I am working so, for discretion sake, if more details are needed please e-mail me to: juan_camilo.rojas_echeverri@uni-leipzig.de I am sad to see that I won't be able to achieve the same selectivity for fragment ions EICs as the beautiful SWATH data that is out there, but I'm trying to make the best out of the instrumentation I have currently available so any help is highly appreciated. Sincerely, JC
MSE_peak_boundaries_error.pptx

Brian Pratt responded:	2021-05-10 10:56
Hi Juan, I don't see the raw files there - I do see some very similarly named ones, but not the actual ones used in the .sky.zip files. For example in seMSE_MPDS_NoPeakCorrection_2021-05-06_11-40-01.sky.zip: need: S4_21_14_M4.raw have: S4_21_14_EM4.raw Is it just a matter of files that have been renamed? Thanks Brian

Juan C. Rojas E. responded:	2021-05-11 00:45
Hi Brian, So there are two data sets: "M" files contain only a simpler mixture of 4 proteins; MPDS standard "EM" files contains a more complext mixture of an E Coli. digest + the spiked MPDS proteins I will upload the missing files ASAP. However, I have observed the same effects in both experiments where the complications were more marked (due to the larger amount of peptides that had to have their peak boundaries corrected) on the EM data set: seMSE_EColi_MPDS_NoPeakCorrection seMSE_EColi_MPDS_YesPeakCorrection seMSE_EColi_MPDS (parent project with all EM data) Thank you very much for your time. Sincerely, Juan C.