Issue 946: Renaming a molecule in small molecules breaks associations

Assigned To:Guest
Opened:2023-03-23 06:24 by Brendan MacLean
Changed:2023-05-03 16:22 by Brian Pratt
Resolved:2023-05-03 16:22 by Brian Pratt
Closed:2023-05-03 16:22 by Brian Pratt
2023-03-23 06:24 Brendan MacLean
Title»Renaming a molecule in small molecules breaks associations
Assigned To»Brian Pratt
Notify»Nick Shulman
At the Dortmund course one of the instructors just warned users of the small molecule interface not to rename a molecule, but instead just add a Note that the name should be something else, because changing the name would break the association with the chromatograms, making it necessary to re-import the data files. This seemed to be a common annoyance, as another instructor quickly agreed it was important and would be much better if the association was not based on the molecule display name.

2023-03-23 10:02 Brian Pratt
The fix here would be to go through and update any references to the molecule. The idea that the name uniquely identifies the molecule is pretty deeply baked in - you can have two otherwise identical molecules and Skyline sees them as distinct entities.

2023-03-23 10:41 Brendan MacLean
Maybe we could have a display name and an ID name, and ask the user if they want to make the change an alias or an ID rename. People are essentially doing the aliasing with an annotation, but would prefer to see the alias more in our displays.

Creative thinking to give users what they want despite our not anticipating it.

2023-03-23 11:06 Brian Pratt
If they give any kind of accession number then this should work, but if the name is all we have to go by then we get into this problem.

Probably the answer is to give each molecule an arbitrary private GUID and use that as ground truth for identity in chromatograms etc.

2023-03-23 11:10 Brendan MacLean
GUID maybe, when we are otherwise going to rely on the display name, which clearly users want to be able to change without losing work.

Alias name and original name may be enough, though.

2023-03-23 11:15 Brian Pratt
Your approach does have a change history aspect to it, that's very desirable.

2023-03-23 11:54 Brendan MacLean
It’s also what we do for renaming FASTA sequences. Why would a user want to do that? Because they do.

2023-03-26 13:27 Nick Shulman
If every DocNode in the document had a GUID that could make other things easier:
1. In pull request #2450 I had to come up with a way to store extra information about the precursors in the .skyd file, but if every TransitionGroupDocNode had a GUID then I would have just added 16 bytes to the ChromGroupHeaderInfo structure.
2. When external tools have to interact with Skyline document nodes, they have to pass a "ElementLocator" and I had to come up with what those would look like, especially in the case that more than one object in the document had the same name. It would have been simpler if it was just "Precursor:ED7BA470-8E54-465E-825C-99712043E01C" (although, it would be harder to guess which precursor in the document that was, compared to what it currently looks like which might be something like "Precursor:/sp|Q86YZ3|HORN_HUMAN/GPYESGSGHSSGLGHR?index=2/light+++").

2023-03-26 15:10 Brendan MacLean
The GUID solution would also make these things completely impossible for a human reader to untangle, and certainly nothing short of a serious programmer. As you point out, the location fields in the document grid would have no inherent meaning to people. We would need to expose the GUID field in the Document Grid and people would need to consult it to make sense of these things.

This seems like a really big change for something that can be solved with a small hack, albeit a hack. And I would hate to see this clear user feedback form small molecule users held up behind a need for a major architectural change like this.

2023-04-21 12:35 Brian Pratt
When I think on this a little more, there are several thing you could do to a molecule to break its association with a chromatogram. Some actually ought to break that, like altering the chemical formula or declared mass, but others should not like adding an InChiKey ID. So in the analogy of renaming FASTA sequences, we'd probably just want to use the original serializable string of the molecule as a robust chromatogram identity rather just its original name.

2023-05-03 16:22 Brian Pratt
resolve as Fixed
Assigned ToBrian Pratt»Brendan MacLean
fixed 5/3/2023 in commit 618ae79398e98f0eeade3d5f217f9706692c629c :

* Allow changes to a molecule's name without losing the association to any existing chromatograms.

Changing a molecule's name would formerly break its association with previously extracted chromatograms, which users did not expect. This fix allows for any change to a molecule that doesn't affect its mass to retain the association to any existing chromatograms.

Observed by Brendan while watching a small molecule tutorial being presented in Dortmund and reported by user Noortje

2023-05-03 16:22 Brian Pratt
Assigned ToBrendan MacLean»Guest