Cassandra for Python Developers

Summary

Apache Cassandra is an open source, distributed (NoSQL) database. This will give a high level introduction to Cassandra and its data model; it will detail the features of pycassa, the Python client library for Cassandra, and how to interact with Cassandra through it.

Description

Being non-relational, Cassandra's data model is fundamentally different from that of a relational database. In addition, it uses an RPC based API rather than a query language. On top of that, Cassandra is a distributed database, so the client must be aware of and interact with multiple nodes in the cluster. All of these attributes of Cassandra make the client libraries a different experience. Fortunately, the Python client library is the easiest way to use Cassandra. This talk will start with a high level overview of the clustering model of Cassandra then its data model. A large portion of the talk will cover the pycassa methods that interact with the data model of Cassandra, i.e. inserting, fetching, and removing data. A small amount of time will be dedicated to describing connection pooling in pycassa -- how it handles node failures, distributes requests, etc. The final 10 minutes will be devoted to Q&A.