Hi Tobi,

It is non-trivial, but it starts with pnorm(z-score) to convert the z-score to a p value. Then we use the Storey-Tibshirani, 2003 algorithm (https://www.pnas.org/content/100/16/9440) to convert the single hypothesis test p values into q values (a.k.a. adjusted p values, or FDR estimates).

You don't even need the decoys at this point. They have exhausted their usefulness during the model training (for training the relative weights of the scoring function coefficients and then estimating scaling and bias to make the final scoring function produce z scores based on the assumption of a normally distributed decoy score population).

Unless that is you want to pursue some more complex method of estimating p values than pnorm(z-score), like:

p value raw = countDecoys(z-score)/countDecoysTotal

p value = p value raw != 0 ? p value raw : Math.Min(1/countDecoysTotal, pnorm(z-score))

This is an alternative implementation, which I believe is at least close to what Percolator uses. It is what is called non-parametric p value estimation (i.e. it doesn't rely on the assumption that the decoy scoring population is normally distributed). People using the non-parametric approach initially found, however, that it had a tendency to estimate a pvalue of zero too quickly (what I call above p value raw) because it becomes zero for any target score that is greater than all seen decoy scores. So, it is still best to decide on some parametric distribution to use as back-up beyond the end of the sampled decoy scoring distribution.

Implementing Storey-Tibshirani is non-trivial, but most of the code I have seen to do it doesn't actually fit a spline to estimate pi zero at p value = 1.0. If you use a fixed "lambda" (Skyline uses 0.4 - the original mProphet used 0.5 - https://static-content.springer.com/esm/art%3A10.1038%2Fnmeth.1584/MediaObjects/41592_2011_BFnmeth1584_MOESM181_ESM.pdf#page=39) then PiZero is not that hard to calculate (from Skyline's Statistics class below):

```
/// <summary>
/// Returns a single point Pi-zero (proportion of features that are truly null)
/// estimation, given a specific p value cut-off, where values in the list are
/// assumed to be well behaved (randomly distributed for false-postives) p values.
/// See Storey and Tibshirani 2003
/// </summary>
/// <param name="lambda">P value cut-off</param>
/// <returns>Pi-zero estimation</returns>
public double PiZero(double lambda)
{
int aboveLambda = 0;
for (int i = 0; i < _list.Length; i++)
{
if (_list[i] > lambda)
aboveLambda++;
}
return aboveLambda/(_list.Length*(1 - lambda));
}
```

Once you have a PiZero (proportion of features which are truly null) then you calculate q values on your set of p values like this:

```
public double[] QvaluesNorm(Statistics s)
{
return new Statistics(PvaluesNorm(s)).Qvalues();
}
/// <summary>
/// Returns q values for a set of p values, optionally using a p value cut-off
/// used for calculating Pi-zero.
/// See Storey and Tibshirani 2003
/// CONSIDER: Use spline fitting described in the paper instead of a supplied
/// cut-off.
/// </summary>
/// <returns>New statistics containing q values for the p values in this set</returns>
public double[] Qvalues(double? lambda = null, double minPiZero = 0)
{
double pi0 = Math.Max(PiZero(lambda), minPiZero);
var ordered = OrderedIndexedList();
int n = _list.Length;
var qlist = new double[n];
for (int i = 0; i < n; i++)
{
double pVal = ordered[i].Key;
int iOrig = ordered[i].Value;
double qVal = pi0*n*pVal / (i + 1);
qlist[iOrig] = qVal;
}
// Enforce that q values are monotonically increasing and never greater than 1
double last = 1.0;
for (int i = n - 1; i >= 0; i--)
{
int iOrig = ordered[i].Value;
double qVal = Math.Min(qlist[iOrig], last);
qlist[iOrig] = qVal;
last = qVal;
}
return qlist;
}
```

Sorry to get a bit code-heavy at the end here. Ignore the code if you want and return to the base citation of Storey-Tibshirani, but this is how it gets implemented in practice and inside Skyline.

Good luck. Can you provide anymore detail on why you would want to attempt this outside Skyline based on our z scores? Seems highly prone to error to me, unless you have the time to validate your implementation as much as existing implementations in Skyline, Percolator, PyProphet and Spectronaut have been validated.

--Brendan