Peak integration wrong for a single replicate, but for thousands of precursors

support
Peak integration wrong for a single replicate, but for thousands of precursors joshuasmith  2023-03-07 11:07
 

Hi all,

I am having an issue where Skyline is integrating peaks incorrectly for one out of 6 replicates, but for thousands of precursors in the file. This is data from the same support ticket as a previous issue with importing the DIA-NN spectral library (https://tinyurl.com/2x326w3a) - thanks Nick for helping on that.

I've attached screenshots of the issue. I have tried refining the peak picking several times and I think I have a really good peak scoring model, but it still is mis-integrating peaks. Sometimes it does pick the correct peak (slide 4), but this is uncommon. I obviously would prefer not to have to manually go through roughly 10,000 precursors and correct integration on what appears to be 80-90% of them. One way I tried to do this more quickly was by looking at run-to-tun regression and finding outliers, but this didn't work. As you can see in slides 5-8, as highlighted with the red box on the regression plot, there isn't much deviation for the misintegrated peaks, and so no good way to target the worst cases first. It's so many anyway that it wouldn't end up being a "targeted" fix.

One thing I did notice is that the replicate that has a lot of integration issues does seem to be unusual in that it has more ID matches for than other replicates, and they tend to skew towards the leading tail of the peak. See slides 8-10. Could that be the issue? Not sure why that would have happened, other than that was the first sample in the file list run through DIA-NN. The spectral library is based on all 6 runs, but the DIA-NN log does say:
"DIA-NN will optimise the mass accuracy automatically using the first run in the experiment. This is useful primarily for quick initial analyses, when it is not yet known which mass accuracy setting works best for a particular acquisition."
I know that this optimization can be turned off by specifying a mass accuracy setting for the run, but I don't know if that would fix the issue I'm seeing.

I have compressed my entire skyline document and can share that if needed, although it's 4 Gb zipped.
Thanks,
Josh

 
 
Nick Shulman responded:  2023-03-08 22:43
In some of your chromatograms, it looks like Skyline did not actually do its own peak detection but instead just used the peak boundaries from the spectral library.
When you build a spectral library from certain types of peptide search results, the .blib file that gets created ends up containing peak boundaries.
By default, when Skyline sees that the library has peak boundaries for a particular peptide and replicate, Skyline skips the peak detection step, and instead just integrates the peak that the library says to integrate.

For some of your chromatograms, there is a single ID line right in at the apex of the peak that was integrated. That is often a sign that Skyline used the peak boundaries from the library. The other way to tell that this happened is that if you go to "View > Other Grids > Candidate Peaks" there will only be one row in the grid.

You can tell Skyline not to use the peak boundaries from the library by going to:
Settings > Peptide Settings > Library
and then push the "Edit List" button and then select the appropriate library and push the "Edit" button and uncheck the box that says "Use explicit peak bounds".

After you have turned off "Use explicit peak bounds", you can tell Skyline to look for peaks again by doing:
Edit > Manage Results > Rescore

If you are going to be training a peak scoring model, it is important that Skyline did its own peak detection. If Skyline only has one candidate peak to work with, it does not make a difference how Skyline weights the features so Skyline will not be effective at determining appropriate weights.

I also sometimes recommend that you choose "Use all matching scans" at "Settings > Transition Settings > Full Scan > Retention time filtering". If the chromatograms are too short, there will also not be enough candidate peaks for the feature weights to be properly determined.
-- Nick
 
Nick Shulman responded:  2023-03-09 13:12
It sounds like the thing that you are actually wondering about is why one of your replicates seems to not be getting any of retention time information from your spectral library which was built from your peptide search results.
You might be able to figure out what is going wrong by going to:
View > Spectral Libraries
and then push the "..." button next to the library name at the top of the Spectral Library Explorer to bring up the Library Details window.
The Library Details window will show you a list of all of the mass spec data files that have information in the library, and my guess is that your misbehaving replicate either will be missing from that list or will have a very low number in the "Matching Specta".

If you send us your Skyline document we might be able to give you more information about what is going wrong. We might also need to see your peptide search results.
-- Nick