Web scraping for astronomy

Sebastien Derriere (CDS), Sebastien Derriere (CDS, Observatoire astronomique de Strasbourg), Thomas Boch (CDS, Observatoire astronomique de Strasbourg)


Abstract

Astronomical web sites and portals are used daily by astronomers, and
are increasingly interactive and customizeable, mainly through the use
of javascript.
In addition, information often arises from the linking of remotely
distributed data and contents. All these potential links can not
always be defined in advance and stored in a web document for at least
two reasons: they could potentially increase the size of the document
source by a large fraction; and sometimes only the user (and not the document
creator) knows where relevant links should be provided.
Web scraping is the process of automatically collecting Web information.
In this context, we started developing a method allowing retrieval of
remote information, and display of this information (including links to
remote websites) in the current document, triggered by a very simple action
from the user: the selection of a portion of text in the web document.
Our first prototype deals with astronomical object names. It is written
in javascript, and can easily be implemented in a web document, or used
as a bookmarklet. Whenever the user selects a portion of text in a web
document, a request to the Sesame name resolver is made to test if this
is a valid object identifier. On success, information retrieved in JSON
allows to display a tooltip with additional information on this object
such as its coordinates, links to various CDS services, image thumbnails, etc.
We present the current status of this work, and discuss how it could
be extended in the future to other applications.

Paper ID: P032

Poster Instructions




Latest News

Quick links

ADASS XXI Conference Poster

Download the Official Conference Flyer:

JPG:   A4  A3

PDF (with printer marks):

8.5in x 11in  11in x 17in  A4  A3  A2