Filter by type:

Sort by year:

Conversational learning

Forough Arabshahi, Kathryn Mazaitis, Toby Jia-Jun Li, Brad A. Myers, Tom Mitchell
Preprint Preprint, 2020


Although machine learning has been highly successful in recent years, this success has been based on algorithms that exhibit only one of the multiple learning paradigms used by humans: learning statistically from many examples. Here we consider a second learning paradigm widely exhibited by humans, but rarely by computers: learning from instruction involving natural language conversations and demonstrations. We argue that this second paradigm – conversational machine learning – is ripe for rapid research progress, and that it holds the potential to make it possible for every user of a computer or mobile device to become a programmer. We define the problem of conversational learning, survey relevant literature, and provide as a case study the Learning from Instruction Agent (LIA) project. Finally we lay out a set of future research directions involving grounded conversational instruction that appear to be key to progress in this area.

Conversational Neuro-Symbolic Commonsense Reasoning

Forough Arabshahi, Jennifer Lee, Mikayla Gawarecki, Kathryn Mazaitis, Amos Azaria, Tom Mitchell
Preprint ArXiv Preprint arXiv:2006.10022, 2020


One aspect of human commonsense reasoning is the ability to make presumptions about daily experiences, activities and social interactions with others. We propose a new commonsense reasoning benchmark where the task is to uncover commonsense presumptions implied by imprecisely stated natural language commands in the form of if-then-because statements. For example, in the command ``If it snows at night then wake me up early because I don't want to be late for work'' the speaker relies on commonsense reasoning of the listener to infer the implicit presumption that it must snow enough to cause traffic slowdowns. Such if-then-because commands are particularly important when users instruct conversational agents. We release a benchmark data set for this task, collected from humans and annotated with commonsense presumptions. We develop a neuro-symbolic theorem prover that extracts multi-hop reasoning chains and apply it to this problem. We further develop an interactive conversational framework that evokes commonsense knowledge from humans for completing reasoning chains.

Tree Stack Memory Units

Forough Arabshahi*, Zhichu Lu*, Sameer singh, Animashree Anandkumar
Preprint ArXiv Preprint arXiv:1911.01545, 2020


Generalization to harder compositional problem instances (a.k.a extrapolation) is challenging for standard neural networks. In contrast, recursive neural networks have the potential to achieve extrapolation because they are able to capture the compositionality of tree-structured data such as mathematical equations. However, recursive networks are prone to error propagation along trees of high depth and are unable to capture long range dependencies effectively. To overcome this, we propose Tree Stack Memory Units (Tree-SMUs), a novel memory augmented recursive neural network whose nodes consist of a differentiable stack. Each SMU cell learns to read from its stack and to write to it by combining the stacks and states of its children through gating. This architecture improves both the local and global representation of compositional data due to better expressive power and the ability to capture long-range dependencies by giving each node indirect access to its descendants. We demonstrate strong empirical results on two tasks and show that Tree-SMU enables accurate extrapolation to significantly harder instances.

Look-up and Adapt: A One-shot Semantic Parser

Zhichu Lu *, Forough Arabshahi*, Igor Labutov, Tom Mitchell
Conference Papers Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019


Computing devices have recently become capable of interacting with their end users via natural language. However, they can only operate within a limited "supported" domain of discourse and fail drastically when faced with an out-of-domain utterance, mainly due to the limitations of their semantic parser. In this paper, we propose a semantic parser that generalizes to out-of-domain examples by learning a general strategy for parsing an unseen utterance through adapting the logical forms of seen utterances, instead of learning to generate a logical form from scratch. Our parser maintains a memory consisting of a representative subset of the seen utterances paired with their logical forms. Given an unseen utterance, our parser works by looking up a similar utterance from the memory and adapting its logical form until it fits the unseen utterance. Moreover, we present a data generation strategy for constructing utterance-logical form pairs from different domains. Our results show an improvement of up to 68.8% on one-shot parsing under two different evaluation settings compared to the baselines.

Towards Solving Differential Equations through Neural Programming

Forough Arabshahi, Sameer Singh, Anima Anandkumar
Workshop Papers the ICML workshop Neural Abstract Machines & Program Induction v2 (NAMPI), Stockholm, Sweden, 2018


We propose using symbolic data for training neural networks that solve differential equations.This results in a generalizable and scalable neural solver. The main reason is that we jointly learn a large number of functions, that cover an entire mathematical domain, and use these trained functions for solving an unseen differential equation. Almost all of the literature focuses on hand-crafting architectures that are tailored for a specific type of differential equation. Moreover, they use numerical evaluations of a differential equation for training, which means that training and tuning needs to be redone for solving a different input differential equation resulting in a lack of scalability and generalizability.

In this work, we investigate the possibility of using neural programs for solving ordinary differential equations (ODEs) by verifying/rejecting a candidate solution of an ODE. We design a neural programmer that is capable of choosing the correct solution with a high accuracy. Our neural programmer, based on a Tree-LSTM, leverages the compositionality of each input ODE.

Combining Symbolic Expressions and Black-box Function Evaluations in Neural Programs

Forough Arabshahi, Sameer Singh, Anima Anandkumar
Conference Papers The 6th International Conference on Learning Representations (ICLR), 2018


Neural programming involves training neural networks to learn programs, mathematics, or logic from data. Previous works have failed to achieve good generalization performance, especially on problems and programs with high complexity or on large domains. This is because they mostly rely either on black-box function evaluations that do not capture the structure of the program, or on detailed execution traces that are expensive to obtain, and hence the training data has poor coverage of the domain under consideration. We present a novel framework that utilizes black-box function evaluations, in conjunction with symbolic expressions that define relationships between the given functions. We employ tree LSTMs to incorporate the structure of the symbolic expression trees. We use tree encoding for numbers present in function evaluation data, based on their decimal representation. We present an evaluation benchmark for this task to demonstrate our proposed model combines symbolic reasoning and function evaluation in a fruitful manner, obtaining high accuracies in our experiments. Our framework generalizes significantly better to expressions of higher depth and is able to fill partial equations with valid completions.

Combining Symbolic Expressions and Black-box Function Evaluations in Neural Programs

Forough Arabshahi, Sameer Singh, Anima Anandkumar
Workshop Papers NIPS 2017, MLtrain Workshop, Long Beach, California

Spectral Methods for Correlated Topic Models

Forough Arabshahi, Anima Anandkumar
Conference Papers Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 54:1439-1447, 2017


In this paper we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are generated by normalizing a family of independent Infinitely Divisible (ID) random variables. The Dirichlet distribution is a special case obtained by normalizing a set of Gamma random variables. We prove that this flexible topic model class can be learnt via spectral methods using only moments up to the third order, with (low order) polynomial sample and computational complexity. The proof is based on a key new technique derived here that allows us to diagonalize the moments of the NID distribution through an efficient procedure that requires evaluating only univariate integrals, despite the fact that we are handling high dimensional multivariate moments. In order to assess the performance of our proposed Latent NID topic model, we use two real datasets of articles collected from New York Times and Pubmed. Our experiments yield improved perplexity on both datasets compared with the baseline.

Are You Going to the Party: Depends, Who Else is Coming?:[Learning Hidden Group Dynamics via Conditional Latent Tree Models]

Forough Arabshahi, Furong Huang, Anima Anandkumar, Carter T Butts, Sean M Fitzhugh
Conference Papers Data Mining (ICDM), 2015 IEEE International Conference on (pp. 697-702). IEEE.


Scalable probabilistic modeling and prediction in high dimensional multivariate time-series is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Examples of such problems include dynamic social networks with co-evolving nodes and edges and dynamic student learning in online courses. Here, we address these problems through the discovery of hierarchical latent groups. We introduce a family of Conditional Latent Tree Models (CLTM), in which tree-structured latent variables incorporate the unknown groups. The latent tree itself is conditioned on observed covariates such as seasonality, historical activity, and node attributes. We propose a statistically efficient framework for learning both the hierarchical tree structure and the parameters of the CLTM. We demonstrate competitive performance in multiple real world datasets from different domains. These include a dataset on students' attempts at answering questions in a psychology MOOC, Twitter users participating in an emergency management discussion and interacting with one another, and windsurfers interacting on a beach in Southern California. In addition, our modeling framework provides valuable and interpretable information about the hidden group structures and their effect on the evolution of the time series.

Beyond LDA: Spectral Methods for Topic Modeling Based on Exchangeable Partitions

Forough Arabshahi, Roi Weiss, Anima Anandkumar
Workshop Papers NIPS workshop on Bayesian Nonparametrics: The Next Generation, 2015.

A frequency domain MVDR beamformer for UWB microwave breast cancer imaging in dispersive mediums

Forough Arabshahi, Sadaf Monajemi, Hamid Sheikhzadeh, Kaamran Raahemifar, Reza Faraji-Dana
Conference Paper Signal Processing and Information Technology (ISSPIT), 2013 IEEE International Symposium on 2013 Dec 12 (pp. 000362-000367). IEEE


In this paper a new imaging technique for early stage ultra wideband (UWB) microwave breast cancer detection is propose A circular array of antennas illuminates the breast tissue with UWB pulses and the bac cattered signals are then passed through a beamformer designed and applied in frequency domain. This design enables the bea ormer to compensate for non-integer delays and frequency dependent dispersion and at the same time increases the accuracy of the beamformer. It is shown that the proposed imaging algorithm reduces the computational cost and memory of the imaging system by decreasing the sampling rate to the Nyquist rate and significantly reducing the number of required matrix inversions. Furthermore, the proposed algorithm significantly improves the quali of the obtained image and on average the signal-to-clutter ratio of the image is increased by 89.29% compared to other cases.