GET /api/v2/video/1211
HTTP 200 OK Vary: Accept Content-Type: text/html; charset=utf-8 Allow: GET, PUT, PATCH, HEAD, OPTIONS
{ "category": "SciPy 2012", "language": "English", "slug": "a-tale-of-four-libraries", "speakers": [ "Alejandro Weinstein", "Michael Wakin" ], "tags": [ "General" ], "id": 1211, "state": 1, "title": "A tale of four libraries", "summary": "", "description": "In addition to bringing efficient array computing and standard mathematical\ntools to Python, the NumPy/SciPy libraries provide an ecosystem where multiple\nlibraries can coexist and interact. This talk describes a success story where\nwe integrate several libraries, developed by different groups, to solve our\nresearch problems. A brief description of our research and how we use these\ncomponents follows.\n\nOur research focuses on using Reinforcement Learning (RL) to gather\ninformation in domains described by an underlying linked dataset. For\ninstance, we are interested in problems such as the following: given a\nWikipedia article as a seed, finding other articles that are interesting\nrelative to the starting point. Of particular interest is to find articles\nthat are more than one-click away from the seed, since these articles are in\ngeneral harder to find by a human.\n\nIn addition to the staples of scientific Python computing NumPy, SciPy,\nMatplotlib, and IPython, we use the libraries RL-Glue/RL-Library, NetworkX,\nGensim, and scikit-learn.\n\nReinforcement Learning considers the interaction between a given environment\nand an agent. The objective is to design an agent able to learn a policy that\nallows it to maximize its total expected reward. We use the RL-Glue/RL-Library\nlibraries for our RL experiments. This libraries provide the infrastructure to\nconnect an environment and an agent, each one described by an independent\nPython program.\n\nWe represent the linked datasets we work with as graphs. For this we use\nNetworkX, which provides data structures to efficiently represent graphs\ntogether with implementations of many classic graph algorithms. We use\nNetworkX graphs to describe the environments implemented in RL-Glue/RL-\nLibrary. We also use these graphs to create, analyze and visualize graphs\nbuilt from unstructured data.\n\nOne of the contributions of our research is the idea of representing the items\nin the datasets as vectors belonging to a linear space. To this end, we build\na Latent Semantic Analysis (LSA) model to project documents onto a vector\nspace. This allows us, in addition to being able to compute similarities\nbetween documents, to leverage a variety of RL techniques that require a\nvector representation. We use the Gensim library to build the LSA model. This\nlibrary provides all the machinery to build, among other options, the LSA\nmodel. One place where Gensim shines is in its capability to handle big data\nsets, like the entire Wikipedia, that do not fit in memory. We also combine\nthe vector representation of the items as property of the NetworkX nodes.\n\nFinally, we also use the manifold learning capabilities of sckit-learn, like\nthe ISOMAP algorithm, to perform some exploratory data analysis. By reducing\nthe dimensionality of the LSA vectors obtained using Gensim from 400 to 3, we\nare able to visualize the relative position of the vectors together with their\nconnections.\n\nIn summary, this talk shows, by combining a variety of libraries to solve our\nresearch problems, that the NumPy/SciPy ecosystem has become the lingua-franca\nof scientific Python computing.\n\n", "quality_notes": "", "copyright_text": "CC BY-SA", "embed": "<object width=\"640\" height=\"390\"><param name=\"movie\" value=\";hl=en_US\"></param><param name=\"allowFullScreen\" value=\"true\"></param><param name=\"allowscriptaccess\" value=\"always\"></param><embed src=\";hl=en_US\" type=\"application/x-shockwave-flash\" width=\"640\" height=\"390\" allowscriptaccess=\"always\" allowfullscreen=\"true\"></embed></object>", "thumbnail_url": "", "duration": null, "video_ogv_length": null, "video_ogv_url": null, "video_ogv_download_only": false, "video_mp4_length": null, "video_mp4_url": "", "video_mp4_download_only": false, "video_webm_length": null, "video_webm_url": "", "video_webm_download_only": false, "video_flv_length": null, "video_flv_url": "", "video_flv_download_only": false, "source_url": "", "whiteboard": "", "recorded": "2012-07-18", "added": "2012-08-31T16:35:06", "updated": "2014-04-08T20:28:27.135" }