Although machine learning has been highly successful in recent years, this success has been based on algorithms that exhibit only one of the multiple learning paradigms used by humans: learning statistically from many examples. Here we consider a second learning paradigm widely exhibited by humans, but rarely by computers: learning from instruction involving natural language conversations and demonstrations. We argue that this second paradigm – conversational machine learning – is ripe for rapid research progress, and that it holds the potential to make it possible for every user of a computer or mobile device to become a programmer. We define the problem of conversational learning, survey relevant literature, and provide as a case study the Learning from Instruction Agent (LIA) project. Finally we lay out a set of future research directions involving grounded conversational instruction that appear to be key to progress in this area.
One aspect of human commonsense reasoning is the ability to make presumptions about daily experiences, activities and social interactions with others. We propose a new commonsense reasoning benchmark where the task is to uncover commonsense presumptions implied by imprecisely stated natural language commands in the form of if-then-because statements. For example, in the command ``If it snows at night then wake me up early because I don't want to be late for work'' the speaker relies on commonsense reasoning of the listener to infer the implicit presumption that it must snow enough to cause traffic slowdowns. Such if-then-because commands are particularly important when users instruct conversational agents. We release a benchmark data set for this task, collected from humans and annotated with commonsense presumptions. We develop a neuro-symbolic theorem prover that extracts multi-hop reasoning chains and apply it to this problem. We further develop an interactive conversational framework that evokes commonsense knowledge from humans for completing reasoning chains.
Generalization to harder compositional problem instances (a.k.a extrapolation) is challenging for standard neural networks. In contrast, recursive neural networks have the potential to achieve extrapolation because they are able to capture the compositionality of tree-structured data such as mathematical equations. However, recursive networks are prone to error propagation along trees of high depth and are unable to capture long range dependencies effectively. To overcome this, we propose Tree Stack Memory Units (Tree-SMUs), a novel memory augmented recursive neural network whose nodes consist of a differentiable stack. Each SMU cell learns to read from its stack and to write to it by combining the stacks and states of its children through gating. This architecture improves both the local and global representation of compositional data due to better expressive power and the ability to capture long-range dependencies by giving each node indirect access to its descendants. We demonstrate strong empirical results on two tasks and show that Tree-SMU enables accurate extrapolation to significantly harder instances.
Computing devices have recently become capable of interacting with their end users via natural language. However, they can only operate within a limited "supported" domain of discourse and fail drastically when faced with an out-of-domain utterance, mainly due to the limitations of their semantic parser. In this paper, we propose a semantic parser that generalizes to out-of-domain examples by learning a general strategy for parsing an unseen utterance through adapting the logical forms of seen utterances, instead of learning to generate a logical form from scratch. Our parser maintains a memory consisting of a representative subset of the seen utterances paired with their logical forms. Given an unseen utterance, our parser works by looking up a similar utterance from the memory and adapting its logical form until it fits the unseen utterance. Moreover, we present a data generation strategy for constructing utterance-logical form pairs from different domains. Our results show an improvement of up to 68.8% on one-shot parsing under two different evaluation settings compared to the baselines.
We propose using symbolic data for training neural networks that solve differential equations.This results in a generalizable and scalable neural solver. The main reason is that we jointly learn a large number of functions, that cover an entire mathematical domain, and use these trained functions for solving an unseen differential equation. Almost all of the literature focuses on hand-crafting architectures that are tailored for a specific type of differential equation. Moreover, they use numerical evaluations of a differential equation for training, which means that training and tuning needs to be redone for solving a different input differential equation resulting in a lack of scalability and generalizability.
In this work, we investigate the possibility of using neural programs for solving ordinary differential equations (ODEs) by verifying/rejecting a candidate solution of an ODE. We design a neural programmer that is capable of choosing the correct solution with a high accuracy. Our neural programmer, based on a Tree-LSTM, leverages the compositionality of each input ODE.
Neural programming involves training neural networks to learn programs, mathematics, or logic from data. Previous works have failed to achieve good generalization performance, especially on problems and programs with high complexity or on large domains. This is because they mostly rely either on black-box function evaluations that do not capture the structure of the program, or on detailed execution traces that are expensive to obtain, and hence the training data has poor coverage of the domain under consideration. We present a novel framework that utilizes black-box function evaluations, in conjunction with symbolic expressions that define relationships between the given functions. We employ tree LSTMs to incorporate the structure of the symbolic expression trees. We use tree encoding for numbers present in function evaluation data, based on their decimal representation. We present an evaluation benchmark for this task to demonstrate our proposed model combines symbolic reasoning and function evaluation in a fruitful manner, obtaining high accuracies in our experiments. Our framework generalizes significantly better to expressions of higher depth and is able to fill partial equations with valid completions.
In this paper we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are generated by normalizing a family of independent Infinitely Divisible (ID) random variables. The Dirichlet distribution is a special case obtained by normalizing a set of Gamma random variables. We prove that this flexible topic model class can be learnt via spectral methods using only moments up to the third order, with (low order) polynomial sample and computational complexity. The proof is based on a key new technique derived here that allows us to diagonalize the moments of the NID distribution through an efficient procedure that requires evaluating only univariate integrals, despite the fact that we are handling high dimensional multivariate moments. In order to assess the performance of our proposed Latent NID topic model, we use two real datasets of articles collected from New York Times and Pubmed. Our experiments yield improved perplexity on both datasets compared with the baseline.
Scalable probabilistic modeling and prediction in high dimensional multivariate time-series is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Examples of such problems include dynamic social networks with co-evolving nodes and edges and dynamic student learning in online courses. Here, we address these problems through the discovery of hierarchical latent groups. We introduce a family of Conditional Latent Tree Models (CLTM), in which tree-structured latent variables incorporate the unknown groups. The latent tree itself is conditioned on observed covariates such as seasonality, historical activity, and node attributes. We propose a statistically efficient framework for learning both the hierarchical tree structure and the parameters of the CLTM. We demonstrate competitive performance in multiple real world datasets from different domains. These include a dataset on students' attempts at answering questions in a psychology MOOC, Twitter users participating in an emergency management discussion and interacting with one another, and windsurfers interacting on a beach in Southern California. In addition, our modeling framework provides valuable and interpretable information about the hidden group structures and their effect on the evolution of the time series.
In this paper a new imaging technique for early stage ultra wideband (UWB) microwave breast cancer detection is propose A circular array of antennas illuminates the breast tissue with UWB pulses and the bac cattered signals are then passed through a beamformer designed and applied in frequency domain. This design enables the bea ormer to compensate for non-integer delays and frequency dependent dispersion and at the same time increases the accuracy of the beamformer. It is shown that the proposed imaging algorithm reduces the computational cost and memory of the imaging system by decreasing the sampling rate to the Nyquist rate and significantly reducing the number of required matrix inversions. Furthermore, the proposed algorithm significantly improves the quali of the obtained image and on average the signal-to-clutter ratio of the image is increased by 89.29% compared to other cases.