Dask Processing and Analytics for Large Datasets
TimeSunday, 17 November 20195pm - 5:05pm
DescriptionThis paper describes the assignment titled "Dask Analytics" that is used for student evaluation as part of a graduate data science and data mining course. For this assignment students are required to read, process and answer queries using a large dataset that does not fit in the RAM memory of a commodity laptop. Using the Python framework Dask, which extends a small set of Pandas's operations, students can become familiar with parallel and distributed processing. In addition, the assignment teaches students about the basics operations implemented in Dask in a very interesting and applied way, as well as operations and algorithms that are harder to parallelize.