GET /api/v2/video/2173
HTTP 200 OK Vary: Accept Content-Type: text/html; charset=utf-8 Allow: GET, PUT, PATCH, HEAD, OPTIONS
{ "category": "SciPy 2013", "language": "English", "slug": "data-processing-with-python-scipy2013-tutorial-6", "speakers": [], "tags": [ "Tech" ], "id": 2173, "state": 1, "title": "Data Processing with Python, SciPy2013 Tutorial, Part 2 of 3", "summary": "Presenters: Ben Zaitlen, Clayton Davis\n\nDescription\n\nThis tutorial is a crash course in data processing and analysis with Python. We will explore a wide variety of domains and data types (text, time-series, log files, etc.) and demonstrate how Python and a number of accompanying modules can be used for effective scientific expression. Starting with NumPy and Pandas, we will begin with loading, managing, cleaning and exploring real-world data right off the instrument. Next, we will return to NumPy and continue on with SciKit-Learn, focusing on a common dimensionality-reduction technique: PCA.\n\nIn the second half of the course, we will introduce Python for Big Data Analysis and introduce two common distributed solutions: IPython Parallel and MapReduce. We will develop several routines commonly used for simultaneous calculations and analysis. Using Disco -- a Python MapReduce framework -- we will introduce the concept of MapReduce and build up several scripts which can process a variety of public data sets. Additionally, users will also learn how to launch and manage their own clusters leveraging AWS and StarCluster.\n\nOutline\n\n*Setup/Install Check (15) *NumPy/Pandas (30) *Series *Dataframe *Missing Data *Resampling *Plotting *PCA (15) *NumPy *Sci-Kit Learn *Parallel-Coordinates *MapReduce (30) *Intro *Disco *Hadoop *Count Words *EC2 and Starcluster (15) *IPython Parallel (30) *Bitly Links Example (30) *Wiki Log Analysis (30)\n\n45 minutes extra for questions, pitfalls, and break\n\nEach student will have access to a 3 node EC2 cluster where they will modify and execute examples. Each cluster will have Anaconda, IPython Notebook, Disco, and Hadoop preconfigured\n\nRequired Packages\n\nAll examples in this tutorial will use real data. Attendees are expected to have some familiarity with statistical methods and familiarity with common NumPy routines. Users should come with the latest version of Anaconda pre-installed on their laptop and a working SSH client.\n\nDocumentation\n\nPreliminary work can be found at: https://github.com/ContinuumIO/tutorials", "description": "", "quality_notes": "", "copyright_text": "http://www.youtube.com/t/terms", "embed": "<object width=\"640\" height=\"390\"><param name=\"movie\" value=\"http://youtube.com/v/cImOpHFZBUI?version=3&amp;hl=en_US\"></param><param name=\"allowFullScreen\" value=\"true\"></param><param name=\"allowscriptaccess\" value=\"always\"></param><embed src=\"http://youtube.com/v/cImOpHFZBUI?version=3&amp;hl=en_US\" type=\"application/x-shockwave-flash\" width=\"640\" height=\"390\" allowscriptaccess=\"always\" allowfullscreen=\"true\"></embed></object>", "thumbnail_url": "http://i1.ytimg.com/vi/cImOpHFZBUI/hqdefault.jpg", "duration": null, "video_ogv_length": null, "video_ogv_url": null, "video_ogv_download_only": false, "video_mp4_length": null, "video_mp4_url": null, "video_mp4_download_only": false, "video_webm_length": null, "video_webm_url": null, "video_webm_download_only": false, "video_flv_length": null, "video_flv_url": null, "video_flv_download_only": false, "source_url": "http://www.youtube.com/watch?v=cImOpHFZBUI", "whiteboard": "needs editing", "recorded": "2013-06-27", "added": "2013-07-04T10:09:07", "updated": "2014-04-08T20:28:26.469" }