Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference
We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program, and show how the analysis can be used to improve the pathwise gradient estimator, one of the most popular methods for posterior inference and model learning. Our improvement increases the scope of the estimator from differentiable models to non-differentiable ones without requiring manual intervention of the user; the improved estimator automatically identifies differentiable parts of a given probabilistic program using our static analysis, and applies the pathwise gradient estimator to the identified parts while using a more general but less efficient estimator, called score estimator, for the rest of the program. Our analysis has a surprisingly subtle soundness argument, partly due to the misbehaviours of some target smoothness properties when viewed from the perspective of program analysis designers. For instance, some smoothness properties, such as partial differentiability and partial continuity, are not preserved by function composition, and this makes it difficult to analyse sequential composition soundly without heavily sacrificing precision. We formulate five assumptions on a target smoothness property, prove the soundness of our analysis under those assumptions, and show that our leading examples satisfy these assumptions. We also show that by using information from our analysis instantiated for differentiability, our improved gradient estimator satisfies an important differentiability requirement and thus computes the correct estimate on average (i.e., returns an unbiased estimate) under a regularity condition. Our experiments with representative probabilistic programs in the Pyro language show that our static analysis is capable of identifying smooth parts of those programs accurately, and making our improved pathwise gradient estimator exploit all the opportunities for high performance in those programs.
Wed 18 JanDisplayed time zone: Eastern Time (US & Canada) change
16:45 - 18:00 | |||
16:45 25mTalk | Affine Monads and Lazy Structures for Bayesian Programming POPL Swaraj Dash University of Oxford, Younesse Kaddar University of Oxford, Hugo Paquet University of Oxford, Sam Staton University of Oxford DOI | ||
17:10 25mTalk | Type-Preserving, Dependence-Aware Guide Generation for Sound, Effective Amortized Probabilistic InferenceVirtual POPL Jianlin Li University of Waterloo, Leni Ven University of Waterloo, Pengyuan Shi University of Waterloo, Yizhou Zhang University of Waterloo DOI | ||
17:35 25mTalk | Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference POPL Wonyeol Lee Stanford University, Xavier Rival Inria; ENS; CNRS; PSL University, Hongseok Yang KAIST; IBS DOI |