Workshop: Dask Processing and Analytics for Large Datasets
Abstract: This paper describes the assignment titled "Dask Analytics" that is used for student evaluation as part of a graduate data science and data mining course. For this assignment students are required to read, process and answer queries using a large dataset that does not fit in the RAM memory of a commodity laptop. Using the Python framework Dask, which extends a small set of Pandas's operations, students can become familiar with parallel and distributed processing. In addition, the assignment teaches students about the basics operations implemented in Dask in a very interesting and applied way, as well as operations and algorithms that are harder to parallelize.
Back to Workshop on Education for High Performance Computing (EduHPC) Archive Listing