Computational Learning Theory through a New Lens: Scalability, Uncertainty, Practicality, and beyond.

Chen Wang

Computational learning theory studies the design and analysis of learning algorithms, and it is integral to the foundation of machine learning. In the modern era, classical computational learning theory is growingly unable to catch up with new practical demands. In particular, problems arise in the following aspects: i). "scalability": with the massive scales of modern datasets, classical learning algorithms that use polynomial time and store the entire input are no longer efficient; ii). "uncertainty": information nowadays is usually coupled with noise and uncertainty, which renders classical algorithms that assume accurate information unreliable; and iii). "practicality": under the modern context, traditional "negligible" terms in learning theory (e.g., log n factors) can no longer be ignored, and many theoretically-efficient techniques become inapplicable in practice. There are several promising approaches to tackle the above challenges. For scalability, one of the most popular approaches is to study learning algorithms under "sublinear" models, including streaming, sublinear time, and Massively Parallel Computation (MPC) models. Learning algorithms under these models usually use resources substaintially smaller than the input size. For uncertainty, we can look into learning algorithms that naturally take noisy inputs, e.g., algorithms that deal with multi-armed bandits (MABs), or algorithms that operate with randomly corrupted inputs. Finally, for practicality, we can focus on designing algorithms that are easy to implement and aim for algorithms with both theoretical guarantees and experimental performances. In light of the above discussion, this dissertation presents three major areas of study as follows: (1) In Part I, we present recent results in streaming multi-armed bandits, where the arms arrive one by one in a stream, and the algorithm stores a much smaller number of arms than the input. We consider two fundamental learning problems in this model, namely pure exploration and regret minimization. For pure exploration, we give an ultra-efficient algorithm that finds the best arm and stores a single extra arm at any point. The algorithm uses a single pass and the information-theoretically optimal number of arm pulls. Subsequently, under various settings, we characterize the optimal sample-memory-pass trade-offs for pure exploration streaming MABs. For regret minimization, we give the optimal regret bounds for single-pass MABs. Together, these results complete a majority of the picture for classical MABs problems under the memory-constraint setting; (2) In part II, we study graph clustering in sublinear settings. We consider two important problems in practice: correlation clustering and hierarchical clustering. For correlation clustering, we design O(1)-approximation algorithms with the semi-streaming O[approximately](n) space in the graph streaming model and with O[approximately](n) time in the query model, where n is the number of "vertices" in the graph. Furthermore, for the truly-streaming polylog "n" space regime, we design algorithms that approximate the "optimal cost" of correlation clustering. We test the polylog "n"-space algorithms on various input data and find that the practical performances are better than the "worst-case" guarantees. For hierarchical clustering, we give a single-pass semi-streaming algorithm that achieves O(1)-approximation for Dasgupta's objective, and we prove the cost-space trade-off in the single-pass setting; and (3) In part III, we move to more practically-driven problems of differential privacy (DP) and weak-strong oracle learning. For the former problem, we consider the differentially private release of "range queries" on graphs--a problem with wide applications in networks--and we give optimal bounds for pure and approximate DP. For the latter problem, weak-strong oracle learning is motivated by an industry setting where information is obtained from a cheap but noisy source and an accurate but expensive source. We study (metric) k-clustering and MST in this setting and obtain nearly-optimal algorithms with strong experimental performances. [The dissertation citations contained here are published with the permission of ProQuest LLC. Further reproduction is prohibited without permission. Copies of dissertations may be obtained by Telephone (800) 1-800-521-0600. Web page: http://bibliotheek.ehb.be:2222/en-US/products/dissertations/individuals.shtml.]