Learning Nonlinear and Deep Low-Dimensional Representations from High-Dimensional Data: From Theory to Practice

ICASSP 2023, Rhodes, Greece

John Wright

jw2966@columbia.edu

Columbia University

Qing Qu

qingqu@umich.edu

University of Michigan

Sam Buchanan

sam@ttic.edu

Toyota Technological Institute at Chicago

Yi Ma

yima@eecs.berkeley.edu

UC Berkeley / HKU

Zhihui Zhu

zhu.3440@osu.edu

Ohio State University

Course Rationale

We are currently in the midst of a data revolution. Massive and ever-growing datasets, arising in science, health, or even everyday life, are poised to impact many areas of society. Many of these datasets are not only large – they are high-dimensional, in the sense that each data point may consist of millions or even billions of numbers. To take an example from imaging, a single image can contain millions of pixels or more; a video may easily contain a billion “voxels”. There are fundamental reasons (“curses of dimensionality”) why learning in high-dimensional spaces is challenging. A basic challenge spanning signal processing, statistics, and optimization is to leverage lower-dimensional structures in high-dimensional datasets. Low-dimensional signal modeling has driven developments both in theory and in applications to a vast array of areas, from medical and scientific imaging, to low-power sensors, to the modeling and interpretation of bioinformatics data sets, just to name a few. However, massive modern datasets pose an additional challenge: as datasets grow, data collection techniques become increasingly uncontrolled, and it is common to encounter gross errors or malicious corruptions, and nonlinearity. Classical techniques break down completely in this setting, and new theories and algorithms are needed.

To meet those challenges, there have been explosive developments in the study of low-dimensional structures in high-dimensional spaces over the past two decades. To a large extent, the geometric and statistical properties of representative low-dimensional models (such as sparse and low-rank and their variants and extensions) are now well understood. Conditions under which such models can be effectively and efficiently recovered from (minimal amount of sampled) data have been clearly characterized. Many highly efficient and scalable algorithms have been developed for recovering such low-dimensional models from high-dimensional data. The working conditions and data and computational complexities of these algorithms have also been thoroughly and precisely characterized. These new theoretical results and algorithms have already revolutionized the practice of data science and signal processing, and have had significant impacts on sensing, imaging, and information processing. On the other hand, recently strong connections between deep neural networks and low-dimensional models emerge at multiple levels, in terms of learning representations, network architectures, and optimization strategies. Such connections do not only help explain many intriguing phenomena in deep learning, but also provide new guiding principles for better network design, optimization, robustness, and generalization of deep networks in both supervised and unsupervised scenarios.

As witnesses to such historical advancements, we believe that it is the right time to deliver this new body of knowledge to the next generation of students and researchers in the signal processing community. Through exciting research developments over the past twenty years, the signal processing community has witnessed the power of sparse and low-dimensional models. In the meantime, however, the community is still in the transition of embracing the power of modern machine learning, especially deep learning, with unprecedented new challenges in terms of modeling and interpretability. In comparison to past tutorials on compressed sensing, convex optimization and related topics, this tutorial (and the associated book, exercises and course materials) is unique in that it bridges fundamental mathematical models from signal processing to contemporary topics in nonconvex optimization and deep learning. The goal is to show how (i) these low-dimensional models and principles provide a valuable lens for formulating problems and understanding the behavior of methods, and (ii) how ideas from nonconvexity and deep learning help make these core models practical for real-world problems with nonlinear data and observation models, measurement nonidealities, etc. 

Course Resources

The major reference is the following book:

John Wright and Yi Ma. “High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications ”. Cambridge University Press, 2022.

A full pre-production edition of the book is available at this link.

Course Overview

The course starts by introducing fundamental linear low-dimensional models (e.g., basic sparse and low-rank models), as well as their connection to (deep) learning-based approaches via learned optimization methods. In a similar vein, we then discuss nonlinear low-dimensional models for several fundamental learning and inverse problems (e.g., dictionary learning and sparse blind deconvolution) and the associated nonconvex optimization toolkit from a symmetry and geometry-based perspective, along with associated approaches for designing deep networks for learning low-dimensional structures, with both interpretability and practical benefits. Building upon these results, we proceed to discuss strong conceptual, algorithmic, and theoretical connections between low-dimensional structures and deep models, first in the context of featue learning (neural collapse) in classification tasks with feedforward neural networks under a nonconvex optimization model, and then in the context of general nonlinear low-dimensional manifold models for signals. In this latter setting, we will begin to lay the foundations for understanding resource tradeoffs in deep learning with low-dimensional data, analogous to the case of sparse signals and convex recovery, and in addition we will see how modern high-performance neural network architectures, namely transformers, can be derived from this perspective.

List of Lectures

Lecture 1
Introduction to Low-Dimensional Models
John Wright
Lecture 2
Learning Unrolled Networks for Sparse Recovery
Atlas Wang
  • J. Liu, X. Chen, Z. Wang, W. Yin, and H. Cai. “Towards Constituting Mathematical Structures for Learning to Optimize.” ICML 2023
  • (α-β) T. Chen, X. Chen, W. Chen, H. Heaton, J. Liu, and Z. Wang, W. Yin, “Learning to Optimize: A Primer and A Benchmark”, Journal of Machine Learning Research (JMLR), 2022
  • X. Chen, J. Liu, Z. Wang, and W. Yin. “Hyperparameter Tuning is All You Need for LISTA.” NeurIPS 2021
  • J. Liu, X. Chen, Z. Wang, and W. Yin. “ALISTA: Analytic weights are as good as learned weights in LISTA.” ICLR 2019
  • X. Chen, J. Liu, Z. Wang, and W. Yin. “Theoretical Linear Convergence of Unfolded ISTA and its Practical Weights and Thresholds.” NeurIPS 2018.
Lecture 3
Design Deep Networks for Pursuing Low-Dimensional Structures
Yi Ma
Lecture 4
Learning Low-Dimensional Models via Nonconvex Optimization
Yuqian Zhang
Lectures 5–6
Low-Dimensional Representations in Deep Networks (Part I , Part II)
Zhihui Zhu, Qing Qu
Lecture 7
Deep Representation Learning from the Ground Up
Sam Buchanan

Beyond the Course

A new conference is being organized – the Conference on Parsimony and Learning – with the aim of bringing together researchers working on topics that we have touched on in the course and creating a venue for the presentation and dissemination of outstanding research in these areas. Attendees are encouraged to consider submitting work and attending in the future.

CPAL Logo