Note that it is also possible to get q values (and z-scores) by training the default Skyline model. Though, we also can't really speak to the validity of calibrating using second-best peaks for the null distribution with this model either. I have always felt that you should not include retention time difference in your scores when using second-best peaks, because the scores of targets and decoys are not independent for that score when using second-best peaks. That is, if the best peak is always very close to the predicted retention time, the will by definition force the second-best peak to be further away, since they are sharing the same chromatograms.
A model using second-best peaks is definitely at your own risk. I can not point to a solid citation for this method. It was added more because we had heard and seen that it works "surprisingly well" in some cases.
To simply train q values and z-scores for the default model, in the Edit Peak Scoring Model form, under Choose model, choose "Default", instead of "mProphet" (the default). This model uses fixed proportional weights for the feature scores it uses, but training it with a set of decoys will adjust the weights to make the total scores z-scores on the null distribution estimated from the decoys. This will not change the peaks picked by Skyline at all, just assign z-scores and q values.
We are planning on making this much more automatic (when you have true decoy targets) in the next Skyline release, so that more people end up with z-scores and q values on the default Skyline peak picks.
Skyline does not currently assign new z-scores and q value to manually adjusted peaks. We could easily be re-calculating the z-scores, but truly recalculating q values would mean that all q values would change every time you manually adjusted a peak, because q values get calculated on the entire set of targets, unless we were to use some kind of q value estimation hack, like the one employed by Hannes Roest in the TRIC paper (
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008461/). This would be of somewhat questionable statistical validity, and even the statistics around manually adjusting targets but not decoys leaves you with a somewhat unfair advantage for targets. To be fully valid, you would probably need to blind youself to which are targets and which are decoys, and manually review and adjust everything, and then recalculate the statistics.
Needless to say, this is not an area we have tread very far into. We calculate statistics once and report them. If you need to make manual adjustments, then you will need to justify them without in some other way than a statistical cut-off value (e.g. 0.01 or 0.05).
In my experience, most reviewers would agree that having a coeluting heavy standard on the entire y- and b-ion series is sufficient proof of measuring your intended analyte, even when there is no visible, similar signal in the analyte (light) chromatograms. So, when you have a small number of targets and matching heavy standard peptides, you generally do not need to worry about statistical modeling, and you are more likely to cause yourself unnecessary pain in getting a usable statistical model out of such a small number of training points.
--Brendan