- Number of videos:
Python has had a long history in Scientific Computing which means it has had the fundamental building blocks necessary for doing Data Analysis for many years. As a result, Python has long played a role in scientific problems with the largest data sets. Lately, it has also grown in traction as a tool for doing rapid Data Analysis. As a result, Python is the center of an emerging trend that is unifying traditional High Performance Computing with "Big Data" applications. In this talk I will discuss the features of Python and its popular libraries that have promoted its use in data analytics. I will also discuss the features that are still missing to enable Python to remain competitive and useful for data scientists and other domain experts. Finally, will describe open source projects that are currently occupying my attention which can assist in keeping Python relevant and even essential in Data Analytics for many years to come.
The people at Continuum have been involved in the Python community for decades. As a company our mission is to empower domain experts inside enterprises with the best tools for producing software solutions that deal with large and quickly-changing data. The Continuum Platform brings the world of open source together into one complete, easy-to-manage analytics and visualization platform. In this talk, Dr. Oliphant will review the open source libraries that Continuum is building and contributing to the community as part of this effort, including Numba, Bokeh, Blaze, conda, llvmpy, PyParallel, and DyND, as well as describe the freely available components of the Continuum Platform that anyone can benefit from today: Anaconda, wakari.io, and binstar.org.
Numba is a compiler for Python syntax that uses the LLVM library and llvmpy to convert specifically decorated Python functions to machine code at run-time. It allows Python syntax to be used to do scientific and numerical computing that is blazing fast yet tightly integrated with the CPython run-time.
Python has long played a role in analyzing large scale data. From tightly-knit super-computers running MPI-based applications to heterogeneous clusters woven together with scripts, Python has had a role to play in making it easier to processes data. This tutorial will cover the tried and true techniques as well as introduce new trends.
Accelerators are the hottest tool in high performance computing but applicable to all fields. We present how to use Python's amazing ability to abstract away the low-level boiler-plate code turning accelerators from an exotic curiosity to a daily tool.
In this tutorial, I will cover how to write very fast Python code for data analysis. I will briefly introduce NumPy and illustrate how fast code for Python is written in SciPy using tools like Fwrap / F2py and Cython. I will also describe interesting new approaches to creating fast code that is leading changes to NumPy on a fundamental level.
Travis Oliphant, CEO of Continuum Analytics, kicks off the PyData Workshop with a talk on Python in Big Data. Topics addressed include what Python has to offer the world of Big Data, specific use-cases, as well asking why Hadoop is considered the de-facto standard.
Additionally, Travis gives an overview of NumPy and SciPy.