Introduction to the tensor-programs framework, a PL approach that helps analyse theoretical properties of deep learning.
While deep learning has many remarkable success stories, finding a satisfactory mathematical explanation on why it is so effective is still considered an open challenge. One recent promising direction for this challenge is to analyse the mathematical properties of neural networks in the limit where the widths of hidden layers of the networks goes to infinity. Researchers were able to prove highly-nontrivial properties of such infinitely-wide neural networks, such as the gradient-based training achieving the zero training error (so that it finds a global optimum), and the typical random initialisation of those infinitely-wide networks making them so called Gaussian processes, which are well-studied random objects in machine learning, statistics, and probability theory.
In this talk, I will introduce Greg Yang’s tensor-programs framework, which has led to substantial generalisations of prior mathematical results on infinitely-wide neural networks. The framework specifies a programming language for expressing computations of neural networks that are parameterised by the widths of those networks. Although simple, the language is expressive enough to cover both forward and backward computations of networks of nearly all architectures.The most important part of The framework is the so called master theorem which says that every program in the framework’s language has a well-defined limit as the widths of the associated network go to infinity, and furthermore the limit can even be defined inductively over the syntax of the program. The tensor-programs framework has been used to generalise results on infinitely-wide neural networks from a few simple network architectures to nearly all architectures.
The goal of my talk is to introduce a possibly-interesting new research topic for PL researchers. I will not assume any prior knowledge on theories of neural networks, in particular, those related to infinitely-wide neural networks and Greg Yang’s tensor programs. At the end of the talk, I will briefly mention a few research opportunities for PL researchers.
Sun 15 JanDisplayed time zone: Eastern Time (US & Canada) change
09:00 - 10:30
First SessionLAFI at Scollay
Chair(s): Steven Holtzen Northeastern University, Christine Tasson Sorbonne Université — LIP6
Christine Tasson Sorbonne Université — LIP6, Steven Holtzen Northeastern University
|Introduction to the tensor-programs framework, a PL approach that helps analyse theoretical properties of deep learning.Boston|
A: Hongseok Yang KAIST; IBS
|Exact Inference for Discrete Probabilistic Programs via Generating FunctionsParis|
A: Fabian Zaiser University of Oxford, C.-H. Luke Ong University of OxfordFile Attached
|Exact Probabilistic Inference Using Generating FunctionsBoston|
A: Lutz Klinkenberg RWTH Aachen University, Tobias Winkler RWTH Aachen University, Mingshuai Chen RWTH Aachen, Joost-Pieter Katoen RWTH Aachen UniversityFile Attached