FlashFill++: Scaling Programming by Example by Cutting to the Chase
Programming-by-Examples (PBE) involves synthesizing an "intended program" from a small set of user-provided input-output examples. A key PBE strategy has been to restrict the search to a carefully designed small domain-specific language (DSL) with "effectively-invertible" (EI) operators at the top and "effectively-enumerable" (EE) operators at the bottom. This facilitates an effective combination of top-down synthesis strategy (which backpropagates outputs over various paths in the DSL using inverse functions) with a bottom-up synthesis strategy (which propagates inputs over various paths in the DSL). We address the problem of scaling synthesis to large DSLs with several non-EI/EE operators. This is motivated by the need to support a richer class of transformations and the need for readable code generation. We propose a novel solution strategy that relies on propagating fewer values and over fewer paths.
Our first key idea is that of "cut functions" that prune the set of values being propagated by using knowledge of the sub-DSL on the other side. Cuts can be designed to preserve completeness of synthesis; however, DSL designers may use incomplete cuts to have finer control over the kind of programs synthesized. In either case, cuts make search feasible for non-EI/EE operators and efficient for deep DSLs. Our second key idea is that of "guarded DSLs" that allow a precedence on DSL operators, which dynamically controls exploration of various paths in the DSL. This makes search efficient over grammars with large fanouts without losing recall. It also makes ranking simpler yet more effective in learning an intended program from very few examples. Both cuts and precedence provide a mechanism to the DSL designer to restrict search to a reasonable, and possibly incomplete, space of programs.
Using cuts and gDSLs, we have built FlashFill++, an industrial-strength PBE engine for performing rich string transformations, including datetime and number manipulations. The FlashFill++ gDSL is designed to enable readable code generation in different target languages including Excel's formula language, PowerFx, and Python. We show FlashFill++ is more expressive, more performant, and generates better quality code than comparable existing PBE systems. FlashFill++ is being deployed in several mass-market products ranging from spreadsheet software to notebooks and business intelligence applications, each with millions of users.
Wed 18 JanDisplayed time zone: Eastern Time (US & Canada) change
15:10 - 16:25 | Synthesis IPOPL at Grand Ballroom A Chair(s): Nadia Polikarpova University of California at San Diego | ||
15:10 25mTalk | Inductive Synthesis of Structurally Recursive Functional Programs from Non-recursive Expressions POPL DOI | ||
15:35 25mTalk | FlashFill++: Scaling Programming by Example by Cutting to the Chase POPL José Pablo Cambronero Microsoft, Sumit Gulwani Microsoft, Vu Le Microsoft, Daniel Perelman Microsoft, Arjun Radhakrishna Microsoft, Clint Simon Microsoft, Ashish Tiwari Microsoft DOI | ||
16:00 25mTalk | Unrealizability Logic POPL Jinwoo Kim University of Wisconsin-Madison; Seoul National University, Loris D'Antoni University of Wisconsin-Madison, Thomas Reps University of Wisconsin-Madison DOI |