Analyzing and interpreting neural networks for NLP

Workshop on analyzing and interpreting neural networks for NLP

BlackboxNLP 2022

There will be a fifth edition of BlackboxNLP! It will be collocated with EMNLP 2022.



Our workshop program is now online.

Invited speakers

Lena Voita

Elena (Lena) Voita is a Research Scientist joining Facebook AI Research. She is mostly interested in understanding what and how neural models learn. Her analysis works so far include looking at model components, adapting attribution methods to NLP models, black-box analysis of model outputs, as well as information-theoretic view on analysis (e.g., probing). Previously, she was a PhD student at the University of Edinburgh supervised by Ivan Titov and Rico Sennrich, was awarded Facebook PhD Fellowship, worked as a Research Scientist at Yandex Research side by side with the Yandex Translate team. She enjoys writing blog posts and teaching; a public version of (a part of) her NLP course is available at NLP Course For You.

The Two Viewpoints on the NMT Training Process

In this talk, I illustrate how the same process (in this case, NMT training process) can be viewed from different perspectives: from the inside of the model and from the outside, i.e. in a black-box manner. In the first view, we look at the model’s inner workings and try to understand how NMT balances two different types of context, the source and the prefix of the target sentence. In the second view, we look at model outputs (i.e. generated translations) at different steps during training and evaluate how the model acquires different competences. We find that NMT training consists of the stages where it focuses on the competences mirroring three core SMT components: target-side language modeling, lexical translation and reordering. Most importantly, the two views show the same process, and we will see how this process is reflected in these two types of analysis.

Catherine Olsson

Catherine Olsson is a research engineer at Anthropic, and the lead author on the recent mechanistic interpretability paper “In-context Learning and Induction Heads”. She has previously worked in technical research roles at Google Brain and OpenAI, and as a grantmaker at Open Philanthropy Project funding academic research in ML robustness.

David Bau

David Bau is Assistant Professor at the Northeastern University Khoury College of Computer Science. He received his PhD from MIT and AB from Harvard. He is known for his network dissection studies of individual neurons in deep networks and has published research on the interpretable structure of large models in PNAS, CVPR, NeurIPS, and SIGGRAPH. Prof. Bau is also coauthor of the textbook, Numerical Linear Algebra.

Direct Model Editing

Can we understand large deep networks well enough to reprogram them by changing their parameters directly? In this talk I will talk about Direct Model Editing: how to modify the weights of a large model directly by understanding its structure. We will consider examples in computer vision and NLP: how to probe and rewrite computations within an image synthesis model to alter compositional rules that govern rendering of realistic images, and how the ROME method can edit specific factual memories within a large language model, directly tracing and modifying parameters that store associations within GPT. I will talk about how causal mediation analysis can serve as a key to unlock the secrets of a huge model; the specificity-generalization trade-off when evaluating knowledge changes in a large model; and how recent results in our MEMIT work suggest that direct editing in huge models may scale orders-of-magnitudes better than traditional opaque fine-tuning.

Important dates

All deadlines are 11:59pm UTC-12 (“anywhere on earth”).

Onsite Poster info (Updated Oct 23)

Please read the Poster and Talk Accessibility, Quality, and Inclusivity guidelines.

Workshop description

Many recent performance improvements in NLP have come at the cost of understanding of the systems. How do we assess what representations and computations models learn? How do we formalize desirable properties of interpretable models, and measure the extent to which existing models achieve them? How can we build models that better encode these properties? What can new or existing tools tell us about systems’ inductive biases?

The goal of this workshop is to bring together researchers focused on interpreting and explaining NLP models by taking inspiration from machine learning, psychology, linguistics, and neuroscience. We hope the workshop will serve as an interdisciplinary meetup that allows for cross-collaboration.

The topics of the workshop include, but are not limited to:

Feel free to reach out to the organizers at the email below if you are not sure whether a specific topic is well-suited for submission.

Call for Papers

All submissions should use the ACL templates and formatting requirements specified by ACL Rolling Review, and should be fully anonymized. Submissions of both types can be made through OpenReview.

Submission Types

Accepted submissions will be presented at the workshop: most as posters, some as oral presentations (determined by the program committee).

Dual Submissions and Preprints

Dual submissions are not allowed for the archival track. Papers posted to preprint servers such as arxiv can be submitted without any restrictions on when they were posted.

Camera-ready information

Authors of accepted archival papers should upload the final version of their paper to the submission system by the camera-ready deadline. Authors may use one extra page to address reviewer comments, for a total of nine pages + references. Broader Impacts/Ethics and Limitations sections are optional and can be included on a 10th page.


Please contact the organizers at blackboxnlp@googlegroups.com for any questions.

Previous workshops


Blackbox NLP 2022 is sponsored by:

Google logo


You can reach the organizers by e-mail to blackboxnlp@googlegroups.com.

Jasmijn Bastings

Jasmijn Bastings is a researcher at Google in Amsterdam. She got her PhD from the University of Amsterdam on Interpretable and Linguistically-informed Deep Learning for NLP. Jasmijn’s current research focuses on interpretable NLP models and predictions, and she authored two BlackboxNLP papers (2018, 2020) on generalisation and saliency methods, as well as an ACL paper (2019) on interpretable neural predictions using differentiable binary variables.

Yonatan Belinkov

Yonatan Belinkov is an assistant professor at the Technion. He has previously been a Postdoctoral Fellow at Harvard and MIT. His recent research focuses on interpretability and robustness of neural network models of language. His research has been published at leading NLP and ML venues. His PhD dissertation at MIT analyzed internal language representations in deep learning models. He has been awarded the Harvard Mind, Brain, and Behavior Postdoctoral Fellowship and the Azrieli Early Career Faculty Fellowship. He co-organised BlackboxNLP in 2019, 2020, and 2021, as well as the 1st and 2nd machine translation robustness tasks at WMT.

Yanai Elazar

Yanai Elazar is a postdoctoral researcher at AI2 & UW. His research focus is interpretability and analysis methods for NLP. His research showed how demographics and linguistics phenomena are encoded in models’ representations, and how more abstract capabilities, such as commonsense and reasoning, are manifested, and being used by models.

Dieuwke Hupkes

Dieuwke Hupkes is a research scientist at Facebook AI Research. The main focus of her research is understanding how neural networks generalise, considering specifically on how they can understand and learn grammar, structure and compositionality. Developing methods to interpret and interact with neural networks has therefore been an important area of focus in her research. She authored many articles directly relevant to the workshop and has co-organised the previous three editions of BlackboxNLP.

Naomi Saphra

Naomi Saphra is a postdoc at New York University. Their research is on understanding the training dynamics of language models, from the standpoint of linguistic structure acquisition. Their relevant work has been published at NAACL and EMNLP, and they have served on the organizing committee for the Workshop on Representation Learning for NLP (RepL4NLP).

Sarah Wiegreffe

Sarah Wiegreffe is a postdoctoral researcher at the Allen Institute for AI (AI2). Her research focuses on improved modeling and analysis of explanations from neural models along the axes of faithfulness and human acceptability, with a recent focus on free-text explanations. Her research on interpretability has been published at leading ML and NLP conferences. She served as a publicity chair for NAACL 2021 and frequently serves on conference program committees.

Anti-Harassment Policy

BlackboxNLP 2022 adheres to the ACL Anti-Harassment Policy.