Bias of an estimator

In statistics, the bias (or bias function) of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, 'bias' is an objective property of an estimator. Unlike the ordinary English use of the term 'bias', it is not pejorative even though it's not a desired property.An estimate of a one-dimensional parameter θ will be said to be median-unbiased, if, for fixed θ, the median of the distribution of the estimate is at the value θ; i.e., the estimate underestimates just as often as it overestimates. This requirement seems for most purposes to accomplish as much as the mean-unbiased requirement and has the additional property that it is invariant under one-to-one transformation. In statistics, the bias (or bias function) of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, 'bias' is an objective property of an estimator. Unlike the ordinary English use of the term 'bias', it is not pejorative even though it's not a desired property. Bias can also be measured with respect to the median, rather than the mean (expected value), in which case one distinguishes median-unbiased from the usual mean-unbiasedness property. Bias is related to consistency in that consistent estimators are convergent and asymptotically unbiased (hence converge to the correct value as the number of data points grows arbitrarily large), though individual estimators in a consistent sequence may be biased (so long as the bias converges to zero); see bias versus consistency. All else being equal, an unbiased estimator is preferable to a biased estimator, but in practice all else is not equal, and biased estimators are frequently used, generally with small bias. When a biased estimator is used, bounds of the bias are calculated. A biased estimator may be used for various reasons: because an unbiased estimator does not exist without further assumptions about a population or is difficult to compute (as in unbiased estimation of standard deviation); because an estimator is median-unbiased but not mean-unbiased (or the reverse); because a biased estimator gives a lower value of some loss function (particularly mean squared error) compared with unbiased estimators (notably in shrinkage estimators); or because in some cases being unbiased is too strong a condition, and the only unbiased estimators are not useful. Further, mean-unbiasedness is not preserved under non-linear transformations, though median-unbiasedness is (see § Effect of transformations); for example, the sample variance is an unbiased estimator for the population variance, but its square root, the sample standard deviation, is a biased estimator for the population standard deviation. These are all illustrated below. Suppose we have a statistical model, parameterized by a real number θ, giving rise to a probability distribution for observed data, P θ ( x ) = P ( x ∣ θ ) {displaystyle P_{ heta }(x)=P(xmid heta )} , and a statistic θ ^ {displaystyle {hat { heta }}} which serves as an estimator of θ based on any observed data x {displaystyle x} . That is, we assume that our data follow some unknown distribution P ( x ∣ θ ) {displaystyle P(xmid heta )} (where θ is a fixed constant that is part of this distribution, but is unknown), and then we construct some estimator θ ^ {displaystyle {hat { heta }}} that maps observed data to values that we hope are close to θ. The bias of θ ^ {displaystyle {hat { heta }}} relative to θ {displaystyle heta } is defined as where E x ∣ θ {displaystyle operatorname {E} _{xmid heta }} denotes expected value over the distribution P ( x ∣ θ ) {displaystyle P(xmid heta )} , i.e. averaging over all possible observations x {displaystyle x} . The second equation follows since θ is measurable with respect to the conditional distribution P ( x ∣ θ ) {displaystyle P(xmid heta )} . An estimator is said to be unbiased if its bias is equal to zero for all values of parameter θ. In a simulation experiment concerning the properties of an estimator, the bias of the estimator may be assessed using the mean signed difference. The sample variance of a random variable demonstrates two aspects of estimator bias: firstly, the naive estimator is biased, which can be corrected by a scale factor; second, the unbiased estimator is not optimal in terms of mean squared error (MSE), which can be minimized by using a different scale factor, resulting in a biased estimator with lower MSE than the unbiased estimator. Concretely, the naive estimator sums the squared deviations and divides by n, which is biased. Dividing instead by n − 1 yields an unbiased estimator. Conversely, MSE can be minimized by dividing by a different number (depending on distribution), but this results in a biased estimator. This number is always larger than n − 1, so this is known as a shrinkage estimator, as it 'shrinks' the unbiased estimator towards zero; for the normal distribution the optimal value is n + 1. Suppose X1, ..., Xn are independent and identically distributed (i.i.d.) random variables with expectation μ and variance σ2. If the sample mean and uncorrected sample variance are defined as

Parent Topic

Child Topic

No Parent Topic