metaseq: a Python framework for integrating sequencing analyses; SciPy 2013 Presentation

Summary

metaseq: a Python framework for integrating high-throughput sequencing analyses

Authors: Dale, Ryan, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of

Track: Bioinformatics

metaseq is a Python package that ties together a growing ecosystem of bioinformatics Python tools and file formats, focusing on flexibility and interactive exploration of high-throughput sequencing data (e.g., ChIP-seq, RNA-seq, and RIP-seq).

This talk will use a worked example to illustrate some practical bioinformatics applications of metaseq's features. For example, its filetype adapters provide random-access, uniform support for commonly-used formats (BAM, bigBed/bigWig, and, via tabix, any tab-delimited format). Combined with multiprocessing and a rebinning routine compiled by Cython, this allows relatively rapid population of NumPy arrays of binned signal over thousands of genes (or other features of interest).

metaseq's "mini-browser" framework connects these arrays -- or any other plot that considers genomic intervals, such as scatterplots of control vs treatment RNA-seq signal -- via callbacks to interactive creation of matplotlib figures that show the local genomic signal and gene models. Alternatively, callbacks can upload data and display them in the UCSC genome browser for further visualization alongside the wealth of publicly available data.