Archiving and Data Management
in HST's Second Decade

Modified document following the third meeting of the Second Decade Committee on 13/14 April 1999


The proposals presented in this document fall into four broad categories:

  1. Strengthening the links with other archive centres, WWW catalog sites and abstract services. This broadens and enriches the archive by allowing exploitation of a multi-wavelength parameter space and keeping track of a relevant subset of literature which bears directly on future use of the data.
  2. Technical developments such as improvements of network access and effective data transmission speeds, including the increased use of data compression, the monitoring of market trends and the adoption of new, high capacity, storage media.
  3. Adding to the scientific utility of the archive by adopting several strategies covering improvements in the quality of calibration and the addition of higher level data products. The guiding principle here is the need to harvest instrumental and data analysis expertise before it disperses and becomes effectively lost. The result is an archive which contains a higher proportion of science readydata. This is particularly relevant to the treatment of the homogeneous data sets expected from the major and parallel programs.
  4. Taking those steps which are necessary to enable the archive to be used in qualitatively new ways. Commonly termed ``data mining'', these developments require the generation of a more comprehensive description of the data than is currently available. While the activities in point 3 above are a necessary prerequisite for this to happen, the extra data processing envisaged for this step goes beyond basic data calibration and combination and probably requires the preparation of catalogs of objects and measurements of their properties. In practice, it requires specific scientific choices to be made during the processing steps and is likely to be quite labour intensive.
In considering these issues, the committee is aware of the context within which these developments will occur. The data rate from HST will soon be dwarfed by that from ground-based optical/IR observatories - both from their 8-10m telescopes and, especially, from the dedicated wide-field survey facilities. The NGST archive will have to be seamlessly incorporated into the scheme and, with a large component of multi-object or integral field spectroscopy, will present its own special demands. The developments in archive technology and network connectivity will be driven by requirements other than astronomy but should, nonetheless, be closely monitored and exploited. The interest in Data Mining techniques is very widespread and is becoming a major topic with computer science. There will be many developments which are exploitable by astronomers but there are also opportunities to squander effort at a time when the requirement is to reduce operational costs. The proposed close cooperation between the archive groups at STScI, ST-ECF, CADC and NOAJ are strongly encouraged in order to enable these new developments to be carried out in an efficient manner while spreading the cost and effort.

The Committee sees these potential developments in terms of the opportunity to multiply the scientific and, coupled with the public outreach effort, wider cultural value of the HST Observatory. Given the necessity to reduce operational costs, careful choices will have to be made in selecting those areas of development which contribute mose effectively to this goal. We believe that the highest priority goal should be to maximise the quality of data in the archive by encouraging the recalibration efforts currently funded for post-operational instruments as part of the ESA/NASA agreement. The mechanisms for incorporation of high-level data products should be developed and applied initially to the larger, homogenious subsets of data such as the HDFs, the Key Programs, parts of the parallel data stream and, in the future, the Major Program products. If significant, labour-intensive, processing efforts are forseen to facilitate Data Mining programs, care should be taken in ensuring that the efforts are driven by a clear scientific goal.

1.  Introduction

The archive of observations from the Hubble Space Telescope is undoubtedly the largest and most heavily used collection of pointed observations in astronomy today. The archive comprises over 6 Terabytes (280,000 observations) of imaging, spectral, time series, polarimetry, and engineering data covering bandpasses from the near UV through near IR, and continues to grow at an average rate of over 100 Gigabytes per month. The archive includes the deepest exposures of the universe ever made - the Hubble Deep Fields (North and South) - and, reflecting the diversity of the HST observing program, encompasses all aspects of modern astronomy: planetary science, stars and stellar evolution, the interstellar medium, galactic structure, normal and active galaxies, clusters of galaxies, quasars, and cosmology. Data retrieval rates from the archive at STScI exceed the data ingest rate, and additional retrievals are supported by the archive sites at the ST-ECF and CADC. The HST archive enables research beyond the scope of the original GO proposals, satisfying NASA-wide goals to maximize the scientific return from its missions.

2.  Background and Current Status

Work on the HST Archive began in 1984. A key decision made at that time was to vest total responsibility for the archive in the STScI, rather than making use of common archival facilities at NASA's National Space Science Data Center. In retrospect, this was a pivotal decision which has led to the development of a distributed data management architecture within astrophysics, planetary science, and space physics. This architecture assures that data sets are curated by organizations with maximum expertise in the data and a vested scientific interest in maintaining their integrity.

The terms of the MOU between NASA and ESA required that a full copy of the Hubble archive be established at the ST-ECF to support data distribution to European astronomers. Limited international network connectivity led Canada to establish the Canadian Astronomy Data Centre to host HST and other archival data sets of interest to Canadian scientists. ST-ECF and CADC participated in the design of the HST archive prototype, the Data Management Facility (DMF), from the outset. DMF was later superseded by the Data Archive and Distribution System (DADS), with data being stored on 12-inch WORM optical disks. Procedures are established between STScI, ST-ECF, and CADC to provide the latter sites with copies of HST science data. ST-ECF and CADC migrated to CDROM data storage, and in order to further economize on storage costs developed an on-the-fly calibration facility so that only uncalibrated data need be archived (uncalibrated data compresses more efficiently than calibrated data, further reducing archive media costs). STScI, ST-ECF, and CADC continued collaborative and complementary efforts on the HST archive in several areas:

Throughout the past 15 years STScI, ST-ECF, and CADC have held archive coordination meetings to share experiences and set goals. STScI has shouldered the bulk of the day-to-day operational responsibilities and ST-ECF and CADC have explored alternative and innovative data access and delivery mechanisms. At this time, ST-ECF is evaluating DVD as a new archival medium and STScI is developing a successor to Starview, Starview-II, which will be implemented in Java and will remove the need to distribute software to remote sites. Starview-II will permit the design of sophisticated query screens, as in Starview, that are not possible with web-based forms, and will enable new levels of interactivity with the archive and the associated catalogs. ST-ECF is contributing to Starview-II by providing Java preview display modules. STScI had also planned to migrate to DVD as a storage medium in the expectation that DVD would quickly supersede CDROM, but the industry has yet to settle on a standard and STScI is concerned that selection of one DVD format over another is too risky at this time. STScI plans to migrate to magneto-optical storage, which is a mature yet growing technology with a large installed base, is comparable in cost to current generation DVD, has proven long-term stability, and has higher I/O performance than other optical media.

In the past year a fourth HST archive site has been established at the National Astronomical Observatory of Japan (NAOJ). NAOJ is using CDROMs for data storage, and STScI is using its bulk CDROM production system to back populate their archive. NAOJ will host only non-proprietary data.

STScI, CADC, and ST-ECF each support archives beyond HST. CADC also hosts data from the Canada-France-Hawaii Telescope, the James Clerk Maxwell Telescope, and a copy of STScI's Digitized Sky Survey, and provides access points to a number of other astronomical archives. ST-ECF's archiving responsibilities are closely coupled with ESO, whose Science Archive Facility includes data from the NTT and VLT and will shortly be extended to the VST - a dedicated, wide-field survey telescope on Paranal. STScI recently took on responsibilities as NASA's UV/optical/near-IR archive center and established the Multimission Archive at Space Telescope (MAST). MAST includes data from the IUE, Astro (HUT, UIT, WUPPE), and Copernicus missions, provides direct access to EUVE data, and will also include data from the FUSE mission. MAST also supports the Digitized Sky Survey and the VLA Faint Images of the Radio Sky at Twenty centimeters (FIRST) survey. STScI has also entered into an agreement with NOAO to provide archive support for the Mosaic Imager. Thus, all three sites support both space- and ground-based data archives with a very large integrated capacity.

STScI works closely with other astrophysics data centers and services. STScI is a member of the Astrophysics Data Centers Coordinating Council (ADCCC) which includes the NASA sponsored High Energy Archive Science Research Center (HEASARC, at GSFC), Infrared Science Archive (IRSA, at Caltech/IPAC), AXAF (now Chandra) Science Center at SAO, the NSSDC (GSFC), the Astronomical Data Center and Astrophysics Data Facility (GSFC), and Astrophysics Data System (SAO). The goal of the ADCCC is to increase interoperability among archive centers and services, ultimately enabling transparent access to these distributed data holdings. STScI, CADC, and ESO/ST-ECF all have close ties with the catalog and bibliographic services provided by the Center Données astrophysiques de Strasbourg (CDS) and NASA Extragalactic Database (NED). STScI and HEASARC have led development of AstroBrowse, a cross-archive data search and discovery tool which utilizes CDS's ``GLU'' system to maintain a distributed database of astronomical data resources. The ADCCC has partnered with planetary science and space physics data providers to develop a successor to AstroBrowse, called ISAIA (Interoperable Systems for Archival Information Access). ISAIA will not only locate data of potential interest to the user, it will integrate the query results from multiple data providers and allow users to get a single view of all relevant information from multiple sites and services. Both AstroBrowse and ISAIA incorporate resources for data acquired on the ground and from space. STScI also participates in NASA's Space Science Data System, which aims at interoperability across all space science disciplines.

3.  New Data, New Technology, New Science

The HST archival facilities are now stable and mature. Work has started to further exploit the rich data holdings and enable multi-wavelength, multi-mission correlative science. For example, cross-correlation facilities at MAST, using WWW interfaces, already allow the user to search for data from various instruments/missions for a given astronomical source and even to look for multi-frequency data for classes of objects belonging to some astronomical catalogs. Further work is required to provide cross-correlations between the HST observation catalog and arbitrary object catalogs. NASA's Astronomical Data Center (ADC) provides a generic interface to its catalog collection which can be used to implement such cross-correlations. Using such public access points the HST archive centers need to provide much more direct access to complementary data holdings to enable comparison of data taken in different spectral regions. And as a convenience to users, alternate means of data delivery need to be studied and developed (more efficient network access, physical distribution on CDROM, DVD, or other high-density media, etc.).

HST is already in its second generation of instruments, and will see a third generation (ACS, COS, WFC3) in its second decade of operations. The calibration of earlier generation instruments will, in time, cease to be improved aside from possible changes in fundamental reference data. The long-term cost of maintaining calibration software will eventually exceed the cost of archiving a final calibrated data product. ST-ECF staff have already been working on strategies for final recalibrations of FOS and GHRS data, and plans must be made for final calibration and rearchiving of data from the other instruments.

In its second decade the HST archive can serve as a testbed for new developments in scientific utility and efficiency, with a focus on preparing for data from NGST. NGST is likely to have more homogeneous observations than HST and will be more conducive to automated object detection and classification, generating a data archive that comprises both pointed observations and a derived source catalog. The requisite tools and technology can be developed with STScI, ESO/ST-ECF, and CADC collaboration, drawing also upon the expertise of our colleagues with experience in large scale surveys (GSC, GSC II, Sloan Digital Sky Survey (SDSS), etc.).

The emerging field of ``data mining'' combined with newly commissioned surveys (SDSS, 2MASS, etc.) is likely to revolutionize astronomy in the coming decade. Data mining allows users to ask new and unanticipated questions of an archive (``archive'' here implies a distributed resource with multiple sources of data). For the HST archive to be conducive to data mining it will be necessary to provide some characterization of the objects in HST images and spectra, e.g., such as the results from the analyses of the Hubble Deep Fields (HDFs) or Medium Deep Survey. Developing a pipeline that extracts meaningful and useful object attributes from the highly heterogeneous collection of HST data will be a substantial challenge, but the potential benefits of such a facility are enormous. Users could pose queries, directly, in scientific terms (e.g., ``are there clusters of galaxies in HST WFPC2 images at the positions of steep spectrum radio sources?).

The ST-ECF, in conjunction with ESO and the CADC, is now planning a pilot project to evaluate the efficacy of data mining using WFPC2 associations. The plan is to create a database for each association containing an object list (positions, magnitudes, object shape parameters), statistics on the object list (number of each type of object, magnitude distributions, etc.), the limiting magnitude for the association, background characteristics, lists of objects in the field of view from GSC I and II and from other HST observations, and associated PSFs. The database is not considered an end product in itself, but rather an additional resource for identifying observations of interest to a user's scientific goals.

The ST-ECF is also considering developing associations for spectral data. A spectral association would comprise all spectra for a given object, grouped as a single data set, with metadata to describe the spectral resolution, wavelength coverage, and signal-to-noise ratio. The archive of FOS spectra will be used as a test case.

The Appendix gives three examples of major scientific investigations which exploit access to multiple archival datasets.

4.  Initiatives for the Second Decade

Specific initiatives which would expand the scientific utility of the archive, especially given its planned growth with the acquisition of new missions and new HST instruments, include:
  1. Establishing closer ties and coordination with other archive centers, to fully exploit the multiwavelength parameter space.
  2. Establishing closer links with catalog WWW sites. The user could select a list of sources, based on some parameters, from one of the hundreds of astronomical catalogs available, for example, at the ADC, and then cross-correlate that with the HST and MAST archives via a simple WWW interface.
  3. Establishing closer links with abstract services (e.g., the Astrophysics Data System [ADS]) to provide a connection between astronomical papers and data.
  4. Inclusion of objective target classifications and of important parameters (e.g., magnitude, redshift, etc.) in the archives, to facilitate searches.
  5. Data characterization and catalogs of selected data sets, providing the astronomical community with science-readyproducts which would be extremely useful, as demonstrated by the HDFs and Medium Deep Survey. HST catalogs would also enable, for example, the identification of the optical counterparts of deep surveys at various wavelengths.
  6. Supporting large survey programs with HST. It is important to recognize that supporting more large programs and/or surveys with HST will have implications for the data archive. The ultimate scientific value and utility of survey or key project data is often directly related to how accessible the ``science-ready'' data products are. At present, the calibrated data provided to GOs usually require additional processing before final scientific conclusions can be reached. If large, homogeneous survey programs become more popular in HST's second decade, then we may wish to consider providing a more science-ready data product to maximize the utility of the program. The GO team can be encouraged (or even required?) to provide their final data products to the archive for subsequent community distribution. Even if large key programs are not adopted, the archives work more assertively with GOs to obtain final data products from them which often have more value to the archival researcher than the basic calibrated data currently provided. This work could be in conjunction with item 3 above, in which the HST archives serve as repositories for the large processed data sets described in the literature.
  7. Optimization of the archive interfaces with the Internet. The outbound bandwidth from STScI, for example, is quite high (at least 20 Megabytes/sec) but is constricted prior to its junction with the public Internet network. With bandwidths approaching 100 Megabytes/sec, electronic transmission of ACS data (with typical GO programs generating ~3-9 Gigabytes) becomes feasible.
  8. Transmission of compressed data (and possibly even lossy compressed data). Lossy compression can result in a 10? reduction in data volumes but with negligible information loss (for certain scientific applications). Providing the user with the option to receive highly compressed data should be explored.
  9. The time required to write ACS GO data to Exabyte tapes will increase by a factor of 2.5-7.5 over current mean tape generation times. Explore alternatives for GO media including DVD, DLT, AIT. Many new high density storage options are now available and are well matched to the high data volumes expected from HST. Eliminating GO media altogether in favor of high-baud rate connections for data retrieval will produce a dramatic savings in operations costs but does require widespread GO access to high-bandwidth internet service providers.

5.  Conclusions

In HST's second decade of operation, the archive will become an ever increasingly important scientific resource. Broader spatial, spectral, and time coverage will allow for analysis of more complete object samples and will make cross-correlation with other ground- and space-based archives and catalogs increasingly fruitful. The integration and interoperability of astrophysics data sets, both within the context of the sites hosting the HST archive and with other astrophysics and space science resources, will allow HST data to be used more broadly and to answer questions that are yet to be formed. Emerging tools for distributed data mining will be especially important, providing scientists with new tools for inquiry and discovery of unanticipated relationships. In the coming decade the tools available to the research astronomer for archival data access will be remarkably more sophisticated than they are today, and they will provide seamless access to data and catalogs that are physically located at many different sites.

The organizations supporting the HST archive should work together toward a common long-term vision of distributed archival services, and draw upon the strengths of each organization and also other major players in scientific archiving and computer science (both in terms of scientific oversight and technical know-how) to contribute to the overall goals. These goals should encompass an archiving strategy for NGST that builds upon the strength and maturity of current systems, yet opens new horizons.


- Bob Hanisch, Paolo Padovani, Megan Donahue, Marc Postman (STScI)
- Piero Benvenuti, Benoit Pirenne, Rudi Albrecht (ESO/ST-ECF)
- Daniel Durand, David Schade (CADC)


This gives three examples of major investigations based on the use of multiple archival datasets.

A.1  Cosmology Studies with Archival Surveys

One outstanding demonstration of the leverage available to archival surveys of serendipitously-observed objects is that of constraining cosmological parameters, such as the density of the universe, with clusters of galaxies. The key to this exercise is finding the rarest, most massive, distant clusters of galaxies because those are the clusters that are expected to evolve most significantly. However, the all-sky X-ray surveys are not sensitive enough to detect the distant clusters, and the pointed deep X-ray surveys do not have sufficient sky coverage to detect the most massive clusters, which are the rarest. The optimal survey for this purpose, thus, is the archival survey, which utilizes deep, pointed observations and identifies the other serendipitously-observed objects that happened to fall into the field of view. Such surveys are moderately deep, but have significant sky coverage. Examples of the cluster surveys are the Extended Medium Sensitivity Survey (EMSS) from Einstein IPC pointed observations (Henry et al. 1992, ApJ 386, 408) and similar surveys based on ROSAT PSPC data (Rosati et al. 1998, ApJ 492, 21; Jones et al. 1998, ApJ 495, 100). If only the clusters hotter than 8 keV and redshifts greater than 0.5 are counted, not even the ROSAT All-Sky Survey would be expected to see any such clusters. Even the ROSAT serendipitous surveys, despite their more sensitive flux limits, did not detect as many high-redshift clusters as did the EMSS because the EMSS has significantly larger coverage in sky area than the ROSAT-based surveys.

Analogously, deep pointed observations with HST such as the Hubble Deep Field are useful in detecting and characterizing faint but rather common objects. But random discovery of rare, relatively bright objects, such as quasars, blazars or clusters of galaxies, is nearly impossible in single pointed fields. On the other end of the sky coverage scale, the Digitized Sky Survey is useful in locating very bright objects, but is not sensitive enough to detect distant blazars or clusters of galaxies with redshift greater than about 0.5. It is possible to achieve intermediate sky coverage at moderate sensitivity by exploiting HST archival data, including the enormous number of parallel and snapshot observations. One example of this is the Medium Deep Survey discovery of z > 0.4 clusters of galaxies (Ostrander et al. 1998, AJ 116, 2644).

Another powerful advantage of the HST archive or the archives of other space missions is access to an enormous amount of data acquired uniformly under extremely reliable conditions. One program can easily benefit from the data of one or more other programs. For example, the study of a complete sample of the morphologies of 341 distant (z = 0.3-0.9) galaxies drawn from two redshift surveys (Brinchmann et al. 1998, ApJ 499, 112) was based on observations of two independent HST programs which were added to the HST observations of the Groth strip (Groth et al. 1994, BAAS 185, 5309). The data could be consistently calibrated and intermixed, which is essential for building large uniform samples. The systematics of various methods of classifying galaxies could be tested and quantified, a feature lacking in studies where the data and the methods are unavailable outside the authors' domain.

A.2  Future Deep Surveys

Large, homogeneous datasets form the foundation for astronomy. Discoveries are most efficiently made by specific observing programs which explore a hitherto uncharted region of parameter space. Surveys at various wavelengths have had a predominant role in this process. The next few years will see a dramatic change in the way we approach surveys, with archival research assuming a fundamental role. A huge amount of data will be produced by new, large-area sky surveys in different bands: FIRST (radio), 2MASS and DENIS (infrared), GSC II and SDSS (optical), GALEX (ultraviolet), ABRIXAS and XMM (X-ray), and AGILE and GLAST (g-ray). These, together with the surveys already available, which include, for example, the NRAO VLA Sky Survey (NVSS), PMN, GB6 (radio), and the ROSAT All Sky Survey (RASSBSC; X-ray), will challenge the ``classical'' approach to surveys.

The way most surveys have been carried out so far, in fact, has required optical identification. That is, optical spectra of all sources had to be taken to classify the object. Take for example the EMSS, which includes 835 sources over 780 deg2 down to an X-ray flux  ~10-13 erg cm-2 s-1 and has provided the deepest view of the X-ray sky over a relatively large area for quite a few years. All sources in the large X-ray error boxes had to be observed to identify the most likely X-ray source (Stocke et al. 1991, ApJS 76, 813). The whole process took about 10 years to complete. This strategy works for small-area, deep surveys or large-area, shallow surveys which include a manageable number of sources (say up to a thousand or so). Dedicated instruments or projects, like the Two degree Field (2dF) and the Sloan Digital Sky Survey (SDSS), can actually adopt the classical approach for a much larger number of sources (of the order of 250,000 for 2dF and a million for SDSS). This however requires populations with relatively large surface density (2dF) and large investments (SDSS). In both cases the optical limit is relatively bright ( ~20-21 magnitude).

The majority of currently available surveys are so large that the classical approach cannot work any longer. Consider for example that the RASSBSC includes 18,811 sources over 92% of the sky, while the White, Giommi, and Angelini (WGA) catalog of ROSAT Point Sources includes about 70,000 sources over ~10% of the sky. The NVSS includes almost 2 million radio sources north of a declination of -40°. It is clear that a spectroscopic identification of all the sources in these surveys is not possible in a reasonable amount of time and given standard resources. The situation will get worse with the forthcoming large-area sky surveys in different bands. Alternative methods for survey identifications have to be applied. One such method relies on the cross-correlation of catalogs in different bands to pre-select the candidates for identification. One still needs to optically identify the selected sub-samples but the selection efficiency increases by large amounts and the number of sources is manageable. This works very well for relatively rare populations, as the initial pool of candidates can be large but the class of interest is selected out based on its spectral energy distribution. The multifrequency information can include not only fluxes in different bands but also optical colors, radio and X-ray spectra, source sizes, etc.

One powerful application of this kind of approach, which used four WFPC2 filters (and therefore a relatively small wavelength range), has been the selection of high-redshift galaxies from the UV ``dropouts'' in the Hubble Deep Field. Another example is the Deep X-ray/Radio Blazar Survey (Perlman, Padovani et al. 1998, AJ 115, 1253). Starting from the WGA catalog ( ~70,000 sources) and the GB6 and PMN catalogs ( ~120,000 sources), DXRBS selects ~1,600 X-ray/radio sources. A further selection on radio spectral index reduces the sample to ~300 objects, ~95% of which turn out to be blazars, the class of interest in this case. The statistical method can be carried one step further to avoid optical identification altogether. In this case the pre-selection is done in a such way as to select objects with nearly unique spectral energy distributions. One ultimately might still want to identify the selected sub-samples but redshift-independent results (like number counts) can be obtained relatively quickly and the selection efficiency increases by orders of magnitude (see Giommi, Menna, & Padovani 1999, MNRAS, in press, for one such example).

As surveys get deeper and deeper, statistical methods for identification, with the consequent need for easy access to data at various wavelengths, will need to become commonplace. The reason is simple: a 4m class telescope can identify a source with relatively strong features (such as a quasar) in a 1-hour exposure down to 22-23 magnitude, while a 10m class telescope can reach 25-26 magnitude in about the same time. This is inadequate to spectroscopically identify the faintest objects found, for example, in the HDFs and, more generally, will be a problem for deep surveys at other frequencies. For example, a typical XMM exposure will reach X-ray fluxes fx ~ 10-15 erg cm-2 s-1. At these levels, using the appropriate X-ray:optical flux ratios, basically all radio-loud AGN will be fainter than the 4m limit for spectroscopical identification and most of them will also be below the 10m limit. At the AXAF limit ( fx ~ 10-16 erg cm-2 s-1) even most radio-quiet AGN will be so faint as to require exceedingly long integration times for optical identification with a 10m class telescope. Normal galaxies, having larger optical:X-ray flux ratios, will be brighter but in that case their lack of relatively strong features will also make spectroscopic identification problematic at these X-ray fluxes. The same applies to the radio band: at the 1 mJy limit of the FIRST survey, most radio-loud sources will have ~24 magnitude and a 4m telescope will not be sufficient for the spectroscopical identification.

Cross-identification of sources at different wavelengths is already, and will be more and more, vital to make progress at fainter fluxes. Statistical ways of pre-selecting or even identifying specific classes of sources work very well (especially for rare populations). Given the large number of upcoming surveys at various frequencies, these methods will have to become commonplace in the near future. The depth of the optical, X-ray, and radio surveys, will also mean that spectroscopic identification for many sources will be extremely hard or outright impossible, leaving the statistical one as the only viable option. All this will require not only a large amount of coordination between the archive centers but also the need to extract catalogs from the raw data. This is especially challenging for the HST archive.

A.3  Variability Studies

One of the benefits of an archive derives from having a large and comprehensive database represented by observations accumulated over a period of time. The STScI archive has also the added bonus of being NASA's UV/optical/near-IR archive center and therefore providing access to a variety of data through MAST.

For example, for nearly twenty years now, and for some yet time to follow, the IUE data archive (included in MAST) has served as a treasure trove for variable phenomena in bright UV sources and at the same time provided a link between the UV response to these phenomena and other wavelength regions. Consider that 50% of all IUE science images were obtained of objects the satellite observed 10 times or more.  Not surprisingly, many of these multiple observations were made of bright hot stars, generally for various tightly focused purposes developed in the observing proposal but not covering other forms of variability not discovered.

Archival data have been particularly useful for studies of repetitive stellar processes such as pulsations, rotational modulation, or binarity. In the latter area, for example, archival data have enabled discoveries such as:

Many extragalactic sources have also been observed many times by different instruments over a long time baseline. For example, about 30 UV HST spectra of NGC 4151, the prototype Seyfert 1 galaxies, have been taken by FOC, FOS, and GHRS over a period of about 8 years. As HST enters in its second decade, long-term UV spectral variability studies will become possible also for relatively faint astronomical sources.