Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers \emph{reverse-mode} AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of \emph{linear factoring} to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98.
Thu 19 JanDisplayed time zone: Eastern Time (US & Canada) change
15:10 - 16:25 | |||
15:10 25mTalk | You Only Linearize Once: Tangents Transpose to Gradients POPL Alexey Radul Google Research, Adam Paszke Google Research, Roy Frostig Google Research, Matthew J. Johnson Google Research, Dougal Maclaurin Google Research DOI | ||
15:35 25mTalk | Efficient Dual-Numbers Reverse AD via Well-Known Program Transformations POPL DOI Pre-print | ||
16:00 25mTalk | ADEV: Sound Automatic Differentiation of Expected Values of Probabilistic ProgramsDistinguished Paper POPL Alexander K. Lew Massachusetts Institute of Technology, Mathieu Huot University of Oxford, Sam Staton University of Oxford, Vikash K. Mansinghka Massachusetts Institute of Technology DOI Pre-print |