ICLR 2020 papers: On Compositional/Systematic generalization (including ours)

An introduction to systematic generalization and a summary of 4 papers from ICLR 2020

Prakash Kagitha
7 min readApr 26, 2020

Systematicity as an argument against deep learning

In 1988, Jerry Fodor leveled a concern against connectionist models (deep learning) explaining/modeling human language understanding and cognition that they are not systematic. Meant to say that some of the data points (like sentences) are systematically similar to some of the other data points and humans can understand all of these data points given that they understand one data point.

For example, if we understand a sentence ‘John loves Mary’, we would also understand ‘Mary loves John’ or for that matter, any sentence of the pattern ‘NP Vt NP’ because, the underlying knowledge (concepts?) in understanding all these sentences is one and the same i.e understanding syntax ‘NP Vt NP’.

As per Fodor, the same behavior should be expected of the model explaining language understanding that it should understand all systematically similar data points if it could understand (learns) one.

As systematicity in human cognition is very strong, this argument against connectionist models has been very prominent, historically, stimulating debate and a great exchange of scientific arguments.

Modernization of systematicity by Lake et.al. 2017

Lake et. al. 2017 instantiated this expectation as an aspect of generalization (systematic generalization) in sequence to sequence(Seq2Seq) learning models with the SCAN dataset. The model is tested on new combinations of already learning concepts as a requirement for the model to be systematic.

For example, the model trained on the input-output pairs (walk, WALK), (jump, JUMP), and (walk left, LTURN WALK) is tested with a pair (jump left, LTURN JUMP). The standard seq2seq models based on GRU, LSTM and their variants with attention failed catastrophically on the SCAN dataset of which the above examples are from.

After this, different datasets are proposed to test deep learning models on Visual Question Answering (VQA). Most recently, Bahadanu et.al. 2019 created CLOSURE, a variant of the CLEVR dataset, which tests the model’s performance on the questions which contain familiar parts in a more complex context. They showed that existing VQA models don’t perform well on this dataset and proposed a variant of Neural Module Networks to improve the performance.

ICLR 2020 and systematic generalization

It is interesting that systematic generalization is explored in different tasks and domains as there are four papers related to systematic generalization published at ICLR 2020 with a new way of solving systematic generalization, a new way of creating train-test splits to test systematic generalization, an investigation of the drivers of systematic generalization in an RL agent and finally, our work showing that the standard LSTM+attn models also exhibit systematic generalization.

Environmental drivers of systematicity and generalization in a situated agent

In this work, an RL agent is trained to follow commands like ‘find Obj’ or ‘lift Obj’ in a 2D/3D environment with different objects. After training ‘find Obj’ with all the objects in the environment and training ‘lift Obj’ with a subset of objects it has to generalize to the commands containing ‘lift’ with new objects (‘lift NewObj’).

This is analogous to the SCAN dataset where the model is tested on new combinations of already learning concepts. Here, the concept of a particular object is learned from ‘find’ command and the concept of ‘lift’ is learned being trained with lift combined with the objects in the trainset.

The following are the drivers of systematicity found in this investigation which would give a lot of insight into systematic generalization of deep learning models. Check out the paper for more information.

“[a] the number of object/word experiences in the training set;

[b] the visual invariances afforded by the agent’s perspective, or frame of reference; and

[c] the variety of visual input inherent in the perceptual aspect of the agent’s perception.”

Point [a] is the starting point for our investigation which is also published at ICLR 2020 (workshop — Bridging AI and Cognitive Science), the last paper we discuss in this paper. (see below)

Measuring Compositional Generalization: A Comprehensive Method on Realistic Data

As we saw earlier, the test set for measuring systematic/compositional generalization in the SCAN dataset contains all the combinations of a primitive jump with modifiers, the combinations that never occurred in the trainset. The model needs to understand the concept of jump from input-output pair (jump, JUMP) and the concept of different modifiers from their combinations with other primitives.

Although, this type of train/test split strategy may not be feasible and even not efficient for measuring compositional generalization in some cases. This is the premise and the problem this work is addressing by formulating a method that automatically creates train/test splits with lower atom divergence and higher compound divergence. See a paragraph from the paper below.

“We use the term compositionality experiment to mean a particular way of splitting the data into train and test sets with the goal of measuring compositional generalization. Based on the notions of atoms and compounds described above, we say that an ideal compositionality experiment should adhere to the following two principles

Similar atom distribution: All atoms present in the test set are also present in the train set, and the distribution of atoms in the train set is as similar as possible to their distribution in the test set.

Different compound distribution: The distribution of compounds in the train set is as different as possible from the distribution in the test set.”

With this method they proposed, they created multiple splits from the SCAN dataset and showed that they are better at evaluating compositional generalization than the original train/test splits.

Also, they created a dataset for sematic parsing called Compositional Freebase Questions (CFQ) which is the largest dataset for studying compositional generalization in natural language known. I recommend to check out their very nice blog post about this work in google ai blog.

Permutation Equivariant Models for Compositional Generalization in Language

This is a new way of thinking about systematic/compositional generalization with the hypothesis that it could be seen as a form of group-invariance. They designed this group-invariance property right into the Seq2Seq model and solved different tasks in the SCAN dataset.

The following is the hypothesis they made and showed good evidence of by solving the SCAN dataset with the group-invariance property.

“Models achieving the compositional generalization required in certain SCAN tasks are equivariant with respect to permutation group operations in the input and output languages.”

Systematic generalization emerges in Seq2Seq models with variability in data (Our Paper)

The first paper we discussed showed that an RL agent systematically generalizes to a new object if a command like ‘lift’ is trained with more objects. i.e. generalizes to ‘pick NewObj’ after trained on ‘pick Obj1’, ‘pick Obj2’… ‘pick ObjN’, where N is good enough number of objects.

Turns out, just an LSTM+attn learns 6 modifiers in SCAN (out of 8 modifiers & 2 conjunctions) with an increased number of distinct primitives in the dataset and generalizes to commands with new primitives never trained with any modifiers like (‘jump twice’, ‘JUMP JUMP’) which is never shown before.

Helping us better understand the characteristics of systematic generalization in deep learning models, we found another behavior that is highly correlated with systematic generalization, instance independent representations of modifiers. I.e. if we subtract ‘walk’ from the command ‘walk twice’ and add ‘jump’, the model would give the output of ‘jump twice’. Vectors added/subtracted are from the encoder's final hidden state. This behavior and systematic generalization are highly correlated with a Pearson coefficient of 0.99.

The more number of distinct primitives a modifier is operated on in the trainset gives rise to systematic generalization and also makes the model represent a modifier independent of any instance of variable it operated on. Instance Independent representations for modifiers. And it is highly correlated with systematic generalization.

We also showed that, with 300 distinct primitives in the dataset, models trained on primitive variables (like ‘jump twice’ ) generalized to compound variables ‘{jump twice} twice’, though in a limited way. The approaches like Syntactic attention and Meta-seq2seq learning which solved the SCAN dataset didn’t show this behavior, more interestingly, even when trained with 300 different primitives.

Systematic generalization and the future

We are still in the initial stage of understanding the systematicity of human cognition and exploring systematic generalization in deep learning models by evaluating for it and finding the inductive biases that enable it.

For now, it is clear that systematicity would be a concern in many domains and tasks like language understanding, abstract and analogical reasoning, semantic scene analysis if we aim to build models that interact with the world efficiently the way humans do.

Finally, check out the ICLR 2020 workshop where our work is published which is a great place to explore all the amazing working at the intersection of cognitive science and AI.

Feel free to discuss any work related to the systematicity of human cognition and systematic generalization. Follow me here for more updates and recent work at the intersection of cognitive science and deep learning.

--

--

Prakash Kagitha

Storyteller of art and science. Deep learning & Cognitive science.