Big Data De-duping

Summary

Derek Eder of Webitects and Forest Gregg, a Ph.D. student of sociology at the University of Chicago, will describe the Python library they are developing to deduplicate tabular data, quickly, accurately, and at a large scale. The library facilitates the matching of related records in different data sets, using a machine learning approach. They expect to have a demo to show and will explain how they expect that the library will be used.