Machine Learning Summer School Pittsburgh 2014


The typical daily schedule for MLSS 2014 is as follows:

  • 9am-12pm: Lecture

  • 12pm-2pm: Lunch

  • 2pm-4pm: Lecture

  • 4pm-5pm: Tutorial

The following is a tentative schedule of speakers and times for the event:


Deepak Agarwal, Director, LinkedIn

Topic: Recommender Systems

Xavier Amatriain, Research/Engineering Director, Netflix

Topic: Collaborative Filtering and Recommender Systems

Abstract: Recommender Systems are a prime example of the practical applicability of many different machine learning algorithms. The most popular approach to recommendation is the so-called Collaborative Filtering, where the goal is to leverage commonalities between similar users and/or items based only on the interactions between them. Typical algorithms to drive collaborative filtering systems are Matrix Factorization, Nearest-neighbors, LDA, clustering, or even Deep neural Networks.

In this lecture I will describe the usage of these and other collaborative filtering techniques. I will also explain how other machine learning approaches such as learning to rank or content-based techniques can be used to personalize an experience. Finally, I will also highlight how many of these algorithms are being used in Netflix, a service that is known to be at the forefront of Recommendation and Personalization algorithms ever since the Netflix $1Million Prize in 2006.

Anima Anandkumar, Assistant Professor, UC Irvine

Topic: Spectral and Tensor Methods

Dan Feldman, Assistant Professor, University of Haifa

Topic: Introduction to Coresets

When we need to solve an optimization problem we usually use the best available algorithm/software or try to improve it. In recent years we have started exploring a different approach: instead of improving the algorithm, reduce the input data and run the existing algorithm on the reduced data to obtain the desired output much faster on a streaming input, using a manageable amount of memory, and in parallel (say, using Hadoop, or cloud service, or GPUs).

A core-set for a given problem is a semantic compression of its input, in the sense that a solution for the problem with the (small) coreset as input yields a provable approximate solution to the problem with the original (Big) data.

From the theory side, I will give a summary of core-set techniques and general frameworks that may help you to build coresets for your own machine learning problem.

From the system side, we will learn how to apply this magical paradigm to obtain algorithmic achievements with performance guarantees for real-time systems using tools such as Hadoop, Matlab and GPUS.

From the commercial side, I will share some social and technical challenges that I had to deal with in the industry and the robotics lab of MIT while developing and implementing core-sets.

Nando de Freitas, Professor, Oxford

Topic:Introduction to deep learning, decision making and language

Abstract: In this set of lectures I will draw bridges between deep learning and two approaches to decision making under uncertainty: Bayesian optimization and reinforcement learning. We will see how learning representations is essential to make progress in building intelligent decision-theoretic agents, while conversely decision theoretic approaches can be used to automatically tune deep architectures and algorithms. I will also discuss attempts at scaling up deep learning in resource constrained environments (academia). Finally, I will discuss convolutional neural networks for language with particular emphasis on transfer and multi-task learning.

Zico Kolter, Assistant Professor, CMU

Topic: Introduction to Machine Learning, Medium-scale Optimization

Abstract: This will be an introductory course in the summer school, presenting a big of background in the basic methods of machine learning, including supervised and unsupervised approaches, optimization-based and probabilistic methods, and basic applications. The first lectures are intended to give a basic background to students who may not yet be familiar with machine learning, but is also expected to be review material for many. The second lecture, on medium-scale optimization, looks at more “traditional” methods for optimization in machine learning, as opposed to the “big data” methods that dominate much of this summer school, and discusses how we can obtain exact (to numerical precision) solutions to many machine learning problems of interest, under the framework of convex optimization.

Mu Li, Graduate Student, CMU

Topic: Parameter Server Tutorial

Quoc Le, Assistant Professor, CMU and Google

Topic: Deep Learning

Tom Mitchell, Professor, CMU

Topic: Never-Ending Language Learning, and How the Human Brain Represents Language Meaning

Abtract: (First lecture)We will never really understand learning until we can build machines that learn many different things, over years, and become better learners over time. We describe our research to build a Never-Ending Language Learner (NELL) that runs 24 hours per day, forever, learning to read the web. Each day NELL extracts (reads) more facts from the web, into its growing knowledge base of beliefs. Each day NELL also learns to read better than the day before. NELL has been running 24 hours per day for over four years now. The result so far is a collection of 70 million interconnected beliefs (e.g., servedWtih(coffee, applePie)), NELL is considering at different levels of confidence, along with millions of learned phrasings, morphological features, and web page structures that NELL uses to extract beliefs from the web. NELL is also learning to reason over its extracted knowledge, and to automatically extend its ontology. Track NELL's progress at, or follow it on Twitter at @CMUNELL. This will be a two-lecture series, in which we will consider never-ending learning as an important new paradigm for machine learning resarch, explore the NELL system as a case study of this paradigm, and examine a number of specific algorithms developed for NELL such as methods for scalable probabilistic inference, and methods for estimating NELL's accuracy from unlabeled data.

(Second lecture) How does the human brain use neural activity to create and represent meanings of words, sentences and stories? One way to study this question is to give peopletext to read, while scanning their brain, then develop machine learning methods to discover the mapping between language features and observed neural activity. We have been doing such experiments with fMRI (1 mm spatial resolution) and MEG (1 msec time resolution) brain imaging, for over a decade. As a result, we have learned answers to questions such as “Are the neural encodings of word meaning the same in your brain and mine?”, “Are neural encodings of word meaning built out of recognizable subcomponents, or are they randomly different for each word?,” and “What sequence of neurally encoded information flows through the brain during the half-second in which the brain comprehends a word?” This talk will summarize some of what we have learned, and newer questions we are currently working on, and will describe the central role that machine learning algorithm play in this research.

Muthu Muthukrishnan, Microsoft

Topic: Data Stream Analytics

Abstract: How do we deal with a large volume of data that arrives at a high rate? One of the approaches is to summarize it via small space representations and subsequently, analyze, compute or learn, using these small representations. In this course, I will present a few of these representations, show how to design them, use them and their limitations.

Alex Smola, Professor, CMU and Google

Topic: Scaling Machine Learning

Markus Weimer, Principal Scientist, Microsoft

Topic: Resource-aware distributed Machine Learning – A primer

Abstract: In this practical class, we develop a distributed machine learning algorithm. We will integrate support for fault handling, resource allocation and other elasticity events into the algorithm itself. This is in contrast to the more traditional approach of expressing the algorithm in a computational framework like MapReduce that hides these events from the algorithm. As we shall see in the course of the class, elasticity events have natural correspondences in the algorithmic space and can be handled there to great benefit. We will develop the code in Java using the REEF Framework that facilitates the results to be run on any Hadoop 2.2+ cluster. Microsoft will sponsor one such cluster for the duration of the class for experimentation. Students are advised to go through the setup tutorial prior to the class

Andrew Wilson, Postoctoral Fellow, CMU

Title: Future Directions for Kernel Methods: Building Kernel Machines and Probabilistic Models for Pattern Discovery

Abstract: Starting from the basics, I will derive popular kernels and introduce kernel methods such as Gaussian processes, support vector machines, kernel density estimation, and kernel PCA. I will show that several well known models, including generalised linear models, infinite neural networks, and splines, are examples of kernel machines. I argue that kernel methods have great potential for developing intelligent systems, since the kernel intuitively and flexibly controls the generalisation properties of a kernel method. I will provide general advice on developing new kernels, and scalable kernel learning approaches, for automatic representation learning. This discussion will include recent work, such as Fastfood and GPatt, introduced in the context of a historical progression of statistical models in machine learning. There will also be new demonstrations on large scale pattern discovery and extrapolation problems such as image inpainting and long range forecasting. Throughout there will be an emphasis on probabilistic modelling, Bayesian methods, and Gaussian processes.