Semestr: Both
Range: 2P
Completion:
Credits: 4
Programme type: Doctoral
Study form: Fulltime
Course language:
Data mining (DM) aims at revealing non-trivial, hidden and ultimately applicable knowledge in large data. Data size and data heterogeneity make two key data mining technical issues to be solved. The main goal is to understand the patterns that drive the processes generating the data. Machine learning (ML) focuses at computer algorithms that can improve automatically through experience and by the use of data. It often puts emphasis on performance that the algorithms reach. The distinction between DM and ML is not strict as machine learning is often used as a means of conducting useful data mining. For this reason, we cover both the areas in the same course. The main goal of the course is to get acquainted with advanced and modern topics in the field.
The course takes the form of “a reading and discussion group”. Students get independently familiar with a topic, every topic is presented by a student. A moderated discussion of all the course participants follows. The topics originate from the book on Mining of Massive Datasets. The book is used in the Stanford University course of the same name (http://www.stanford.edu/class/cs246/), it serves as the main teaching resource in the ETH course Data Mining: Learning from Large Data Sets (http://las.ethz.ch/courses/datamining-s12/). The other topics will be hot topics stemming from recent research papers. At the same time, the discussed topics will be motivated by the problems actually solved by the course participants in their dissertations or other research.
1. Rajaraman, A., Leskovec, J., Ullman, J. D.: Mining of Massive Datasets, Cambridge University Press, 2011.
2. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer, 2009.
3. Peng, R. D., Matsui, E.: The Art of Data Science. A Guide for Anyone Who Works with Data. Skybrude Consulting, 200, 162, 2015.