Bias can also be introduced by improper fitting procedures.
For example, the reduction program known as SNOPY, used for many years at La
Silla and also at La Palma, uses a simple iterative technique in which some
subsets of the parameters are solved for, while others are held fixed; then the
ones last solved for are held fixed, while the ones previously fixed are
redetermined.
The earliest version of PEPSYS used this technique [26].
However, it required *hundreds* of iterations to approach the true minimum
in
the sum of squares, and consequently was abandoned as soon as computers became
large enough to allow simultaneous solutions for all the parameters.

A particularly treacherous aspect of such subspace alternation is that the sum of squares decreases rapidly for the first few iterations and then levels off. If the iterations are performed by hand, as they are with SNOPY, the user is likely to think that the system has converged when both the residuals and the parameter changes have become small. Nevertheless, the current values of the parameters can be far from the desired solution.

What happens is that the solution finds its way to the floor of a long, narrow
valley in the sum-of-squares hypersurface in the *n*-dimensional parameter
space.
Even with optimal scaling, such valleys occur whenever the independent
variables are partially correlated, so that the parameters themselves are
correlated.
This nearly always happens in multiparameter least-squares problems.
At each iteration,
the descent to the valley floor (i.e., the minimum in the mean-square
error) occurs within the subspace of the parameters that are adjusted at each
iteration, but is necessarily at the starting values of the parameters that
were held fixed.
At the next iteration, the solution moves a short distance in this
orthogonal subspace, and again finds a point on the valley floor; but, again,
it only finds a *nearby* point on the axis of the valley, and is unable to
progress *along* the axis, because of the parameters that are held fixed.
Succeeding iterations take very small steps, back and forth across the valley
axis, while making only very slow progress toward the true minimum at the
lowest point on that axis.
This behavior is notorious in the numerical-analysis community, where it is
known as ``hemstitching''.

The fatal slowness of convergence along coordinate
directions is explicitly described
and illustrated on pp. 294-295 of *Numerical Recipes* [20], although
the schematic diagram shown there does not show how slow the process really is.
Actual computations show that hundreds or
even many thousands of iterations may not be
enough to get acceptably close to the true minimum: see, for example, Figure 4j
of Gill et al. [9] or Figure 9.7 of Kahaner et al. [14].
But in every iteration after the first few, the sum of the squared residuals
is small and changes very slowly.
One must not forget that obtaining small residuals is not an end in itself, but
is merely the means of finding what we really want, namely, the best estimates
of the parameters.

The most effective way to prevent hemstitching is to adjust all the parameters simultaneously. In photometric problems, this means solving a rather large system of simultaneous equations. For example, if we have 100 stars observed in 4 colors on 5 nights, there are 400 different stellar parameters, 25 extinction coefficients and nightly zero-points, and a few additional transformation coefficients to be determined, if they are estimated simultaneously. In broadband photometry, there are also parameters proportional to the second central moments of the passbands to be estimated, which introduce cross-terms and hence nonlinearity. Fortunately, the nonlinearity is weak, and can be very well handled by standard techniques (cf. [3]) that converge quickly.

The problem can be speeded up further by taking advantage of its special structure: the matrix of normal equations is moderately sparse, with a banded-bordered structure; the largest block (containing the stellar parameters) is itself of block-diagonal form. Thus matrix partitioning, with a block-by-block inversion of the stellar submatrix, provides the solution much more quickly than simple brute-force elimination of the whole matrix. A single iteration takes only a few seconds, in typical problems.