Thanks for summarizing this Brendan.
Are you aware of a significant difference between parsimony options #1 and #2? AFAIK, protein parsimony should be synonymous with applying Occam's Razor to the protein list. The only difference I can think of is set cover implementation. The PDF Mike provided doesn't mention what implementation MSDaPI uses to calculate the parsimonious set: is it a greedy algorithm, or is it brute forcing to find the minimum covering set (set cover is an NP-Hard problem: https://en.wikipedia.org/wiki/Set_cover_problem
), which would be rather slow for large protein/peptide sets? From my reading of the PDF, IDPicker behaves extremely similarly to MSDaPI except when combining datasets (IDPicker recalculates the parsimonious set whenever new data is added/removed). How would you see Skyline handling that (when new FASTA/searches are added to an existing document)? IDPicker actually keeps all proteins in the background in order to recalculate the parsimonious set from the full set. If Skyline entirely forgets the unparsimonious protein-peptide pairs then it would be impossible to do that recalculation. But I don't see a way to do MSDaPI's per-dataset parsimony without having a more structured organization of results in Skyline (i.e. what results are technical replicates of each other, which are fractions of the same experiment, etc).
From our earlier discussions, I don't remember whether you wanted to see this feature modify the document model or simply use the existing free text (?) peptide-node-parent to describe the protein group constituents. In any case the parsimony algorithm should be decoupled from the presentation so I should be able to work on that while we discuss what/how to make model changes.