To register the sky-subtracted images to a common reference, it is necessary to precisely estimate the offsets between them. jitter applies a 2d cross-correlation routine to determine the offsets to an accuracy of 1/10th of a pixel. There are other ways to find out offsets between frames: with many point-sources, point-pattern matching is a possibility. Identifying the same objects in all consecutive frames would also yield a list of offsets between frames. An initial estimate of the offsets between frames can be found in the FITS headers. jitter assumes that the offsets found in the input FITS headers have a certain accuracy. If there are no input offsets, they are all initially estimated to be zero in both directions.
Registering the images is done by resampling them with subpixel shifts to align them all to a common reference (usually the first frame). Resampling can make use of any interpolation algorithm, but be aware that using cheap and dirty algorithms like nearest-neighbor or linear interpolation can degrade the images by introducing aliasing. jitter offers many higher-order interpolation kernels that introduce few or no artifacts; however, the noise (high frequencies) will be smoothed a little bit.
Stacking the resulting images is done using a 3d filter to remove outliers and jitter gives you a choice between 3 different filters. Linear means that all frames are actually averaged without filtering (pass-all filter). This is not recommended as this is likely to keep cosmic rays and other outliers in the final frame. Median means that the final frame is the median of all resampled frames. The last filter (default) scales all frames by their medians and removes the highest and lowest pixel values before taking an average. See the jitter documentation for more information.
Notice that in versions later than eclipse version 4, jitter resamples and stacks in one step to speed up the process. Also added since version 4, the final frame is a union of all input images (as opposed to an intersection for previous versions), which means that it is bigger than any of the initial input frames.