R: A software environment for comprehensive statistical analysis of astronomical data

Eric Feigelson (Pennsylvania State University)


The R statistical software environment has become the premier public domain tool for statistical development in many fields. A high-level language similar to IDL, base-R provides an infrastructure for data manipulation and analysis together with dozens of statistical analysis functions covering statistical distributions, smoothing, regression, maximum likelihood estimation, multivariate analysis and classification, neural networks, resampling, survival analysis, spectral- and time-domain time series analysis. It includes extensive high-quality graphical capabilities. As an open source environment, R is supplemented by over 3000 user-provided packages in the Comprehensive R Archive Network (CRAN). CRAN has been growing exponentially since 2001 with a new package arriving daily, and now includes tens of thousands of specialized statistical functionalities. R and CRAN functions can call, and be called from, C, Fortran, Python and other languages. A simple interface to R has been developed for the Virtual Observatory. The use of R is illustrated with multivariate classification of Sloan photometry using data mining techniques. Clearly, R can greatly extend our statistical analysis capabilities in astronomical data analysis systems. A separate tutorial gives a hands-on introduction to R with emphasis on data mining and megadatasets.

Slides in PDF format

Paper ID: I03

