The bias problem

Next: Equations of condition Up: Extinction and transformation models Previous: Extinction and transformation models

The bias problem

Bias can also be introduced by improper fitting procedures. For example, the reduction program known as SNOPY, used for many years at La Silla and also at La Palma, uses a simple iterative technique in which some subsets of the parameters are solved for, while others are held fixed; then the ones last solved for are held fixed, while the ones previously fixed are redetermined. The earliest version of PEPSYS used this technique [26]. However, it required hundreds of iterations to approach the true minimum in the sum of squares, and consequently was abandoned as soon as computers became large enough to allow simultaneous solutions for all the parameters.

A particularly treacherous aspect of such subspace alternation is that the sum of squares decreases rapidly for the first few iterations and then levels off. If the iterations are performed by hand, as they are with SNOPY, the user is likely to think that the system has converged when both the residuals and the parameter changes have become small. Nevertheless, the current values of the parameters can be far from the desired solution.

What happens is that the solution finds its way to the floor of a long, narrow valley in the sum-of-squares hypersurface in the n-dimensional parameter space. Even with optimal scaling, such valleys occur whenever the independent variables are partially correlated, so that the parameters themselves are correlated. This nearly always happens in multiparameter least-squares problems. At each iteration, the descent to the valley floor (i.e., the minimum in the mean-square error) occurs within the subspace of the parameters that are adjusted at each iteration, but is necessarily at the starting values of the parameters that were held fixed. At the next iteration, the solution moves a short distance in this orthogonal subspace, and again finds a point on the valley floor; but, again, it only finds a nearby point on the axis of the valley, and is unable to progress along the axis, because of the parameters that are held fixed. Succeeding iterations take very small steps, back and forth across the valley axis, while making only very slow progress toward the true minimum at the lowest point on that axis. This behavior is notorious in the numerical-analysis community, where it is known as ``hemstitching''.

The fatal slowness of convergence along coordinate directions is explicitly described and illustrated on pp. 294-295 of Numerical Recipes [20], although the schematic diagram shown there does not show how slow the process really is. Actual computations show that hundreds or even many thousands of iterations may not be enough to get acceptably close to the true minimum: see, for example, Figure 4j of Gill et al. [9] or Figure 9.7 of Kahaner et al. [14]. But in every iteration after the first few, the sum of the squared residuals is small and changes very slowly. One must not forget that obtaining small residuals is not an end in itself, but is merely the means of finding what we really want, namely, the best estimates of the parameters.

The most effective way to prevent hemstitching is to adjust all the parameters simultaneously. In photometric problems, this means solving a rather large system of simultaneous equations. For example, if we have 100 stars observed in 4 colors on 5 nights, there are 400 different stellar parameters, 25 extinction coefficients and nightly zero-points, and a few additional transformation coefficients to be determined, if they are estimated simultaneously. In broadband photometry, there are also parameters proportional to the second central moments of the passbands to be estimated, which introduce cross-terms and hence nonlinearity. Fortunately, the nonlinearity is weak, and can be very well handled by standard techniques (cf. [3]) that converge quickly.

The problem can be speeded up further by taking advantage of its special structure: the matrix of normal equations is moderately sparse, with a banded-bordered structure; the largest block (containing the stellar parameters) is itself of block-diagonal form. Thus matrix partitioning, with a block-by-block inversion of the stellar submatrix, provides the solution much more quickly than simple brute-force elimination of the whole matrix. A single iteration takes only a few seconds, in typical problems.

Next: Equations of condition Up: Extinction and transformation models Previous: Extinction and transformation models

Petra Nass
1999-06-15