GET /api/v2/video/2366
HTTP 200 OK Vary: Accept Content-Type: text/html; charset=utf-8 Allow: GET, PUT, PATCH, HEAD, OPTIONS
{ "category": "Kiwi PyCon 2013", "language": "English", "slug": "computational-advertising-billions-of-records-a", "speakers": [ "Alan Williams" ], "tags": [], "id": 2366, "state": 1, "title": "Computational Advertising, Billions of records, and AWS - Lessons Learned", "summary": "Lessons learned while setting up a computational advertising platform on AWS with emphasis on experimental data analysis and scaling.\r\n", "description": "@ Kiwi PyCon 2013 - Sunday, 08 Sep 2013 - Track 2\r\n\r\n**Audience level**\r\n\r\nExperienced\r\n\r\n**Abstract**\r\n\r\nOverview - hope this will be useful, but caveat emptor - not a how-to, that's well covered elsewhere - problem - recovering value from large web logs - user targeting\r\n\r\nIs this Big Data? - When should you think about Hadoop - AWS servers available with 244 GB of memory - Twitter WTF paper, Microsoft cluster utilisation paper\r\n\r\nLogging, Storing, and Munging - Looked at EMR but (1) it's hard to log (2) versioning issues. - For on-demand use CM is good - For automated use, combination of CDH, whirr, and boto. - backing up HBase and HDFS to S3\r\n\r\nProcessing the data - hadoop as solving distributed IO - Pig + udfs - hadoop streaming\r\n\r\nLearning on the data - difficult data - latest machine learning algorithms, not just existing mapreduce algorithms (mahout) - frameworks are starting to appear - Graphlab, or the Berkeley Spark ecosystem. - want to experiment on smaller data to reduce iteration time.\r\n\r\nPrototype Learning Algorithm - loading text files into numpy arrays when memory constrained - JIT python compilation - scikit-learn - logistic regression - spectral clustering and the FEAST algorithm - nearest neighbors (output to gephi) - read/write binary formats\r\n\r\nImplementation at scale - shoehorn into map-reduce - Port successful algorithms to GraphLab, C++ and MPI or Boost Graph Library etc. - MIT Starcluster .. - Numba, Blaze, Theano, KDT - Anaconda", "quality_notes": "", "copyright_text": "", "embed": "<object width=\"640\" height=\"390\"><param name=\"movie\" value=\";hl=en_US\"></param><param name=\"allowFullScreen\" value=\"true\"></param><param name=\"allowscriptaccess\" value=\"always\"></param><embed src=\";hl=en_US\" type=\"application/x-shockwave-flash\" width=\"640\" height=\"390\" allowscriptaccess=\"always\" allowfullscreen=\"true\"></embed></object>", "thumbnail_url": "", "duration": null, "video_ogv_length": null, "video_ogv_url": "", "video_ogv_download_only": false, "video_mp4_length": null, "video_mp4_url": "", "video_mp4_download_only": false, "video_webm_length": null, "video_webm_url": "", "video_webm_download_only": false, "video_flv_length": null, "video_flv_url": "", "video_flv_download_only": false, "source_url": "", "whiteboard": "", "recorded": "2013-09-12", "added": "2013-10-19T19:48:49", "updated": "2014-04-08T20:28:26.090" }