ALLStars: Overcoming Multi-Survey Selection Bias using Crowd-Sourced Active Learning

Dan Starr (UC Berkeley), Joseph Richards (UC Berkeley), Henrik Brink (Dark Cosmology Centre), Adam Miller (UC Berkeley), Josh Bloom (UC Berkeley), Nathaniel Butler (Arizona State University), J. Berian James (Dark Cosmology Centre), James Long (UC Berkeley), John Rice (UC Berkeley)


Abstract

Developing a multi-survey time-series classifier presents several challenges. One problem is overcoming the sample selection bias which arises when the instruments or survey observing cadences differ between the training and testing datasets. In this case, the probabilistic distributions characterizing the sources in the training survey dataset differ from the source distributions in the other survey, resulting in poor results when a classifier is naively applied. To resolve this, we have developed the ALLStars active learning web framework which allows us to bootstrap a classifier onto a new survey using a small set of optimally chosen sources which are then crowd-sourced to users for manual classification. Several iterations of this crowd-sourcing process results in a significantly improved classifier. Using this process, we've built an variable star light-curve classifier using OGLE, Hipparcos, ASAS and PTF survey data and plan on bootstrapping onto SDSS Stripe 82 as well as other active survey datasets in the near future.

Paper ID: P146

Poster Instructions




Latest News

Quick links

ADASS XXI Conference Poster

Download the Official Conference Flyer:

JPG:   A4  A3

PDF (with printer marks):

8.5in x 11in  11in x 17in  A4  A3  A2