Specifying Integration Boundaries

support
Specifying Integration Boundaries jrenders  2020-11-09
 

Hi there,
My small molecule analysis methods produce some pretty ugly peaks for a few analytes. Skyline does its best, but I do not expect any integration algorithm could figure out some of my data. What I have been searching for to fix the problem is some place to specify explicit integration boundaries. In reading back on support, I understand that "explicit retention time" and "window" will get me close but these are more "suggestive" to skyline (and in fact do not work for my peaks). Integrating a peak and then rt-clicking and hitting "Apply Peak to All" also seems more suggestive than explicit and does not work. I would like to have skyline integrate all peak area between two specified RT's without any consideration to inflection points or anything else and then apply this setting across the whole batch for that particular molecule. Is that possible at this time? Thanks!

 
 
Brian Pratt responded:  2020-11-09

In earlier versions of Skyline, the Explicit RT value was, as you say, somewhat suggestive. If Skyline didn't find anything that looked like a peak within that window it would go looking elsewhere. Now, though, if it doesn't find anything within that window it just stops looking and records no peak.

It sounds like you wish Skyline would just treat whatever is within that window as signal, regardless of whether it seems peak-like?

Best regards,

Brian Pratt

 
jrenders responded:  2020-11-09

Hi Brian,
Correct, Skyline's idea of "peak-like" and my idea (in a small molecule world) will likely be very different and will honestly depend on the analyte class. One specific example is in when analyzing PFAS. Some analytes can appear as a branched form or a linear form, with moderate RT differences between the two. Historically, both of these forms are just summed and you report the total. However, some samples will have just linear and some just the branched and some will have both. I imagine this (and other circumstances) would be a hard phenomenon to teach an integration algorithm. If instead, I were able to tell skyline "just integrate everything from 2.3 minutes to 3.4 minutes no matter what" this would save me a TON of time. Then if there is any RT drift or anything else I can go through and manually clean up. Currently, I am going in and correcting every sample (50+) for almost every analyte (30+).

I see some places in the document grid that list what the integration boundaries are but these are return fields and cannot be edited...

Thanks!

 
jrenders responded:  2020-11-13

Hi Brian,
Just wanted to follow up on this. Is it safe to say that this functionality isn't available at this time, but may be someday? Thanks.

 
Brian Pratt responded:  2020-11-13

I'll make a formal note of it in our development queue.

There are some barriers to implementation, though: Skyline currently makes a pretty hard assumption about the presence of some kind of peak, for the purpose of background subtraction. In this case, where there's signal from your point of view, absent a peak Skyline would see it as noise and subtract it out, leaving something pretty close to zero.

Luckily I believe there is some unrelated work going on in Skyline that might make this more tractable than it formerly was. Could you furnish some examples of the phenomena you're thinking of? Skyline document (File > Share> Complete), raw data, and a few screen shots that illustrate the problem? This would go a long way toward making this a reality.

Best,
Brian Pratt

 
jrenders responded:  2020-11-17

Hi Brian,
Thanks for the info - I'll keep an eye out for updates! I've attached some screenshots and a document showing some poor peak shapes that are not handled by Skyline as well as I would hope. Thanks.

 
Brian Pratt responded:  2020-11-17

Thanks for those images - this is very interesting.

I would have naively looked at those and said Skyline is doing an excellent job of discerning the peak from some major interference. And I'd have (again naively - I'm a software engineer, not a molecular biologist) felt justified because Skyline's integration of the heavy labeled versions seems to be correct.

It almost seems like the branched and linear forms should be treated as different molecules for analysis purposes since they clearly have distinct elution profiles.

What are your thoughts?

Thanks again,

Brian

 
jrenders responded:  2020-11-17

Hi Brian,
I definitely agree with you - in a perfect world we would make up a calibration curve for the branched form for and a separate calibration curve for the linear form and handle them separately. The problem, however, is the prevalence of these various forms in unknown samples and the availability of standards. I've observed this phenomenon in two different contexts:
-One is in human metabolism where the body can metabolize a chemical by adding a hydrophilic functional group in one of multiple locations on the [drug] molecule (see attached Image 1). The two molecules that result from metabolism of morphine are isobars and although the mass spectrometer cannot tell the difference (except for sometimes with the fragments if you're lucky), these two molecules will have slightly different retention times. Years ago when we were buying the standard for this metabolite (for running a calibration curve) it ended up that it was only possible to synthesize (or isolate) one of these two variants. Therefore we made a calibration curve with only the one form, but the patient samples contained both forms. In this case we integrated the calibration curve like normal (since there was only a single peak there), but then for the samples we integrated both forms and reported the result as "total" morphine-glucuronide. This is not ideal but is generally accepted for structural isomers (with similar retention times) when an exact standard of each form cannot be acquired.
-In the PFAS world, it is a similar issue, the byproducts that are found in water streams contain both several different branched forms as well as a linear form (Image 2). The standard that we are able to purchase is also a mix of branched and linear because they are unable to sufficiently (or economically) separate the linear form from the various branched forms that may be occurring. The current protocol according to government standards is to integrate the whole mess and label it as a "total" quant that incorporates both forms (so long as ion ratios pass). In the end the toxicology to determine whether the branched or linear forms have a greater impact on human health falls to somebody smarter than I - hopefully they are smart enough to figure out how to synthesize pure standards...

Hopefully this sheds some light on the issues - small molecules can be quite tricky like this at times and that's where the ability to just specify a start and end point to the integration window would be really helpful for reducing manual batch processing.

 
Brian Pratt responded:  2020-11-17
OK, this improves my understanding of what you're up against.

You might want to experiment with a different document structure, e.g.

PFHxSP
|- PFHxSP_linear
| |- [M-H]
| | [M6C13-H]
|- PFHxSP_branched
  |- [M-H]
  |- [M6C13-H]
PFHpS_P
|-PFHpS_P_linear
...etc

and assign a different explicit RT to the linear and branched forms of each molecule.
 
jrenders responded:  2020-11-18
Hi Brian,
This is a good idea, however, this is not the standard established by folks researching PFAS longer than I have. Additionally, when we buy standards at a certified concentration from a vendor, the concentration is certified based on the total branched+linear and so if quant them separately I can no longer confidently assign a quantitative value to my unknowns.
 
Brian Pratt responded:  2020-11-18
It's just that you're up against some fundamental assumptions in Skyline about the retention time being an important way to tell one molecule apart from another. From Skyline's point of view you've got two different molecules of identical mass, and the chromatography is doing its job of picking them apart.

But of course you know that they're the same molecule from an analysis point of view, so the question before us is how to take advantage of that higher level knowledge. The easiest way would be to somehow get Skyline to combine their signal at the time of chromatogram extraction.

Could I get one or two of your raw files so I can play around with this a bit?

Thanks
Brian
 
Brendan MacLean responded:  2020-11-18
I will admit that at one point I did realize that there is no inherent reason that Skyline peak integration needed to be limited to just one time range, but that is how I designed it, and after 11+ years, this is really the first potentially valid case anyone has ever brought up that would require multiple time ranges for the same "target". It would not be an easy change to add multiple ranges now.
 
jrenders responded:  2020-11-18
Hi Brian,
This makes sense and I wouldn't want to change Skyline's integration abilities as all of these exceptions that exist in the small molecule world would just render an integration algorithm useless I would think. What a lot of instrument vendors do in their software is allow you to essentially overrule automatic integration on a case-by-case basis by entering explicit integration windows (for these exception cases), so that you can then apply this window to all samples for a given analyte.

I would be happy to share some raw data, but can you remind me how to share this in a more private forum? Thanks.
 
Brian Pratt responded:  2020-11-18
I'll contact you off list.
 
jrenders responded:  2020-11-20
Hi Brendan,
Sorry, I missed your post previously. To be clear, I am still describing only a single time range. I just want to be able to specify that time range explicitly via a number-input text box and have it applied to all samples within a certain molecule assignment. This time range may encompass several peaks (as defined by Skyline), but the range itself is continuous and not broken up into segments or anything (that sounds way more complicated programming/logic-wise). Thanks.
 
jrenders responded:  2020-11-20
Hi Brian,
Thanks for working that out in the skyline document showing me how I can have both portions of the peak integrated separately. This is an option for getting both portions of the peak integrated automatically (independently), but this would still require me to export and sum the areas and then calibrate outside of skyline which would make my entire workflow pretty much explode. At this point, it is easiest to use CTRL+down arrow and just go through each sample and manually change the integration for each analyte that has this type of irregular peak shape. It's just a shame as this takes a lot of time to do and it seems like something that should conceivably be able to be stipulated once per batch based on an explicit integration boundary setting that overrules automatic integration parameters. I know other folks here at NC State who use Skyline for metabolomics and other small molecule applications that are interested in a setting like this as well because several of them have been asking me if I know how to set explicit integration boundaries (I've become the super-user in our group). I can have them join in the conversation if you think it would be helpful.
 
Brian Pratt responded:  2020-11-20
Here's an idea to kick around which uses the existing Explicit Retention Time and Window concept, and (I think) would not require major changes to Skyline's internals:

By default, Skyline continues with the current behavior, which is to use the best peak in the specified RT range. If no peak is found in the range, no peak is recorded.

Now imagine a new optional behavior in which Skyline still uses the best peak in the range, but extends that peak's integration boundary to include any other peaks found in the range whose would-be integration boundaries form a continuum with the best peak's boundaries. It is still the case that if no peak is found in the range, no peak is recorded.

So in this new optional mode, for example:
 Skyline find peaks A, B, C, D, and E (in ascending RT order) within a precursor's explicit RT range.
 B is the best peak, and thus is the peak of record.
 A overlaps B in RT so B's boundaries are extended to encompass those of A.
 C overlaps B in RT, and D overlaps C. B's integration boundaries are again extended, to also include those of C and D.
 E does not overlap D. E is ignored.

  If C did not overlap B, then C, D and E would all be ignored and B's boundaries would just be the union of A and B boundaries..

Does that make sense?

Brian
 
jrenders responded:  2020-11-24
Hi Brian,
This makes sense and would work in some cases, for example the PFAS examples I've shared where there is a very wide low peak semi-overlapping with a larger more resolved peak. However, I think there will still be problem cases because the integration is still predicated on a skyline definition of what constitutes a peak or a peak boundary. If I have two isomeric peaks that I want to integrate as a single compound (i.e., they are isomers of the same compound) and they are baseline resolved, skyline might say these two peaks do not overlap therefore they don't meet the criteria that you've outlined above.
I realize this is not the norm for proteomics workflows and most non-targeted metabolomics workflows but this functionality is not uncommon when developing targeted small molecule assays. If you are doing proteomics workflows you would definitely not want Skyline to confuse adjacent peaks even when they overlap and I really wouldn't want any changes to be made to skyline in this way because I think it would negatively impact many workflows out there.

Right now when I process my batch and I don't "approve" of Skyline's interpretation of my peaks, I go in and manually drag the dashed integration boundary lines to match what I want Skyline to integrate. After I do this for a single sample I am very satisfied with the result for that sample. The problem is that I have sometimes 10's of analytes and 100's of samples that require this manual intervention. Instead of messing with Skyline's inherent integration process/algorithm, isn't there a way to take the integration boundaries I've imposed on one sample and say "copy these exact integration boundaries to all other samples for this target"? In this way the user would overrule/ignore Skyline's integration and go with the user's defined boundaries. Right now this workflow functions just fine within Skyline, the issue is that it can only be implemented one sample at a time via dragging integration boundary lines. It just seems like a target-level strict RT boundary assignment text box entry or rt-click "apply integration boundaries to all" function should be feasible without impacting the Skyline integration brain.

The folks I work with were certain that the "Explicit Values" --> "Retention time" and "Retention time window" functionality within the "Modify..." command (rt clicking a target in the target window) would do what I am describing but were confused when Skyline still tried to enforce it's integration rules within these explicitly user-defined RT boundaries.
 
Brian Pratt responded:  2020-11-24
>> I really wouldn't want any changes to be made to skyline in this way because I think it would negatively impact many workflows out there.

Absolutely, any behavior change would need to be something the user has to explicitly enable.

>> isn't there a way to take the integration boundaries I've imposed on one sample and say "copy these exact integration boundaries to all other samples for this target"?

Have you looked at the "Apply to All" function?

>> The folks I work with were certain that the "Explicit Values" --> "Retention time" and "Retention time window" functionality within the "Modify..." command (rt clicking a target in the target window) would do what I am describing

I could imagine a (specifically user enabled!) mode where we don't even worry about overlapping peaks, but take the whole window if any peak at all is found. The problem with taking the whole window whether or not a peak is found, as mentioned previously, is that Skyline will likely get tripped up on background subtraction. I suppose if we find no actual peak we could just declare the "peak" at the maximum intensity RT for the window, and hope for the best.