On Estimating P Values by Monte Carlo Methods (2024)

To the Editor:

North et al. (2002) propose a new formula for the empirical estimation of P values by Monte Carlo methods to replace a standard conventional estimator. They claim that their new formula is “correct” and “most accurate” and that the conventional formula is “not strictly correct,” repeating this claim many times in their letter. The claim, however, is incorrect, and the conventional formula is the correct one.

The North et al. claim arises when a test statistic (called here “t”) takes a certain numerical value (called here “t*”) when calculated from data from some experiment, and it is required to find an unbiased estimate of the P value corresponding to t* by Monte Carlo simulation. This is done by performing n Monte Carlo simulations, all performed under the null hypothesis tested in the original experiment and with the same sample size and other characteristics as for the original experiment. Suppose, to be concrete, that sufficiently large positive values of the test statistic t are significant. Then, we define “r” as the number of simulations in which the simulation value of t is greater than or equal to the observed value t*. North et al. claim that an unbiased, and thus preferred, estimate of the P value arising from these simulations is (r+1)/(n+1) instead of the conventional estimate r/n. This claim is incorrect.

Strangely, North et al. (2002) themselves show by algebra that the mean value of their estimator (r+1)/(n+1) is (nP+1)/(n+1), where “P” is the P value to be estimated. Since this is not equal to P, their P value estimator is biased. Further, their calculation also shows that the mean value of the conventional estimator r/n, whose use they do not recommend, is the desired value P. Thus, the conventional estimator is unbiased. Thus, there is an internal inconsistency in their argument, and their algebraic calculations contradict their claim and the argument leading to it. The algebraic calculations are correct. It is important to see why the argument given in North et al. (2002) is incorrect, since the reasoning involved relates to the theory and practice of Monte Carlo simulation procedures that are performed increasingly in genetics, in particular to questions surrounding P values and type 1 errors.

The incorrect argument given by North et al. (2002) is that if the original data were generated under the null hypothesis tested, then, in all, n+1 “experiments” were conducted, of which one is real and n simulation. With r as defined above, in r+1 of these, the value of the statistic t is either equal to the observed value t* or is greater than this value. It is then claimed that the estimator (r+1)/(n+1) is an unbiased estimator of the null hypothesis probability that the test statistic t exceeds t* when the null hypothesis is true.

The error in this argument is, perhaps, best demonstrated by considering parallel reasoning used in the genetic ascertainment sampling context, exemplified as follows. Suppose that we wish to estimate the proportion of girls in a population, using a sample of families from that population. However, the sampling procedure is such that only families in which the oldest child is a girl are included in the sample. Clearly, using all children in the sample to estimate the proportion of girls in the population is incorrect, and the sample proportion of girls will overestimate the population proportion. The oldest child in each family, automatically included in the category of interest (girls), must be excluded in the estimation process. The analogy with the Monte Carlo case is that the observed value of the test statistic found from the actual data must be excluded in estimating a P value, since it is similarly automatically included in the category of interest (greater than or equal to itself). Any mathematical calculation concerning P values that does take this into account will be incorrect.

It now appears that North et al. (2002) used mistaken terminology, and that the claim that they wished to make does not concern P value estimation, but that use of (r+1)/(n+1) “provides the correct type 1 error rate.” More precisely, if the type 1 error is chosen to be α, then it is claimed that rejecting the null hypothesis when (r+1)/(n+1)<α leads to the desired type 1 error of 5%.

To see this in formal statistical terms, the null hypothesis is rejected, with the notation and assumptions given above, if the value of r is “too low.” More specifically, with the chosen type 1 error of α, the null hypothesis is rejected if r<K, where K is chosen so that Prob(r<K, givennullhypothesisistrue)=α.

The one “experimental” and n simulation values of t, leading to a total of n+1 values, can be listed in ascending order. The event that r<K is identical to the event that the experimental value of t lies among the highest K+1 of these n+1 values. The null hypothesis probability of this is (K+1)/(n+1). Equating the probability(K+1)/(n+1) with α, we get K=(n+1)α-1 K=(n+1)α-1. The event r<K is, thus, the same as the event (r+1)/(n+1)<α, and this is the criterion that North et al. give.

This procedure does not, however, imply, as claimed by North et al. (2002), that (r+1)/(n+1) is an unbiased estimate of the P value. It is best to keep the questions of unbiased estimation of the P value and the nature of the testing procedure that leads to a desired type 1 error separate. Pursuing this point, it is not clear in what sense North et al. relate, as they do, a P value estimate to a type 1 error. They claim, for example, that when r=0, so that the standard procedure P value estimate r/n is also 0, it is implied, under the standard procedure, that the type 1 error is also 0. This claim is incorrect. A type 1 error in statistics is set in advance, typically 5% or 1%, and the value so chosen for it is not in any way determined by or estimated from the observed value of any statistic.

References

North BV, Curtis D, Sham PC (2002) A note on the calculation of empirical P values from Monte Carlo procedures. Am J Hum Genet 71:439–441 [PMC free article] [PubMed] [Google Scholar]