Help us!

Take some time to transcribe PyCon 2014 talks! Click on the "Share" button below the video and then "Subtitle" to get started.

Diving into NumPy Code, SciPy2013 Tutorial, Part 1 of 4

Summary

Presenters: David Cournapeau, Stefan Van der Walt

Description

Do you want to contribute to NumPy but find the codebase daunting ? Do you want to extend NumPy (e.g. adding support for decimal, or arbitrary precision) ? Are you curious to understand how NumPy works at all ? Then this tutorial is for you.

The goal of this tutorial is do dive into NumPy codebase, in particular the core C implementation. You will learn how to build NumPy from sources, how some of the core concepts such as data types and ufuncs are implemented at the C level and how it is hooked up to the Python runtime. You will also learn how to add a new ufunc and a new data type.

During the tutorial, we will also have a look at various tools (unix-oriented) that can help tracking bugs or follow a particular numpy expression from its python representation to its low-level implementation.

While a working knowledge of C and Python is required, we do not assume a preliminary knowledge of the NumPy codebase. An understanding of Python C extensions is a plus, but not required either.

Outline

The tutorial will be divided in 3 main sections:

Introduction: Why extending numpy in C ? (and perhaps more importantly, when you should not) being ready to develop on NumPy: building from sources, and building with different flags (optimisation and debug) Source code organisation: description of the numpy source tree and high-level description of what belongs where: core vs the rest, core.multiarray, core.ufunc, scalar arrays and support libraries (npysort, npymath)

The main data structures around ndarray:

the arrayobject and data type descriptor, and how they relate to each other. exercise to add a simple array method to the array object dealing with arbitrary array memory layout with iterators Adding a new dtype: Anatomy of the dtype: from a + a to a core C loop Simple example to wrap a software implementation of quadruple precision (revised version of IEEE 754 software) The current set of planned hand-on tasks/exercises:

building from sources with debug symbols adding an array method to compute a simple statistic (e.g. kurtosis) adding a new type to handle quadruple precision type Required Packages

You will need a working C compiler (gcc on unix/os x, Visual Studio 2008 on windows), and be familiar how to use it on your platform git if possible, gdb and cgdb on unix if possible: valgrind and kcachegrind for supported platforms (linux) Vagrant VM available here: https://s3.amazonaws.com/scipy-2013/divingintonumpy/numpy-tuto.box (use vagrant 1.2.1, as 1.2.2 has a serious bug for sharing files)