AstroML: data mining and machine learning for Astronomy

Description

Python is currently being adopted as the language of choice by many astronomical researchers. A prominent example is in the Large Synoptic Survey Telescope (LSST), a project which will repeatedly observe the southern sky 1000 times over the course of 10 years. The 30,000 GB of raw data created each night will pass through a processing pipeline consisting of C++ and legacy code, stitched together with a python interface. This example underscores the need for astronomers to be well-versed in large-scale statistical analysis techniques in python. We seek to address this need with the AstroML package, which is designed to be a repository for well-tested data mining and machine learning routines, with a focus on applications in astronomy and astrophysics. It will be released in late 2012 with an associated graduate-level textbook, 'Statistics, Data Mining and Machine Learning in Astronomy' (Princeton University Press). AstroML leverages many computational tools already available available in the python universe, including numpy, scipy, scikit- learn, pymc, healpy, and others, and adds efficient implementations of several routines more specific to astronomy. A main feature of the package is the extensive set of practical examples of astronomical data analysis, all written in python. In this talk, we will explore the statistical analysis of several interesting astrophysical datasets using python and astroML.