Implicit Multicore Parallelism using CnC-Python
We introduce CnC-Python (CP), an approach to implicit multicore parallelism for Python programmers based on a high-level macro data-flow programming model called Concurrent Collections (CnC). With the advent of the multi-core era, it is clear that improvements in application performance will primarily come from increased parallelism. Extracting parallelism from applications often involves the use of low-level primitives such as locks and threads. CP is implicitly parallel and enables programmers to achieve task, data and pipeline parallelism in a declarative fashion while only being required to describe the program as a coordination graph with serial Python code for individual nodes (steps). Thus, CP makes parallel programming accessible to a broad class of programmers who are not trained in parallel programming. The CP runtime requires that Python objects communicated between steps be picklable, but imposes no restriction on the Python idioms used within the serial code. Most data structures of interest to the SciPy community, including NumPy arrays, are included in the class of picklable data structures in Python.
The CnC model is especially effective in exploiting parallelism in scientific applications in which the dependences can be represented as arbitrary directed acyclic graphs ("dag parallelism"). Such applications include, but are not limited to, tiled implementations of iterative linear algebra algorithms such as Cholesky decomposition, Gauss-Jordan elimination, Jacobi method, and Successive Over-Relaxation (SOR). Rather than using explicit threads and locks to exploit parallelism, the CnC-Python programmer decomposes their algorithm into individual computation steps and identifies data and control dependences among the steps to create such computation DAGs. Given the DAG (in the form of declarative constraints), it is the responsibility of the CP runtime to extract parallelism and performance from the application. By liberating the scientific programmer, who is not necessarily trained to write explicitly parallel programs, from the nuances of parallel programming, CP provides a high-productivity path for scientific programmers to achieve multi-core parallelism in Python.