Logo

Analyzing and interpreting neural networks for NLP

Workshop on analyzing and interpreting neural networks for NLP

BlackboxNLP 2023

The sixth edition of BlackboxNLP will be co-located with EMNLP, in Singapore on December 7 2023.

News

Schedule

9:00 - 9:10 Opening remarks

9:10 - 10:00 Invited talk by Zhijing Jin

10:00 - 10:30 2 oral presentations:

  1. Systematic Generalization by Finetuning? Analyzing Pretrained Language Models Using Constituency Tests
    Aishik Chakraborty, Jackie CK Cheung and Timothy J. O’Donnell
  2. Emergent Linear Representations in World Models of Self-Supervised Sequence Models
    Neel Nanda, Andrew Lee and Martin Wattenberg

10:30 - 11:00 Break ☕

11:00 - 12:30 In-person & virtual poster sessions - East Foyer located on B2 of the Resorts World Sentosa Convention Center and on gather.town.

12:30 - 14:00 Lunch 🥪

14:00 - 15:30 6 oral presentations:

  1. Knowledge-grounded Natural Language Recommendation Explanation
    Anthony Colas, Jun Araki, Zhengyu Zhou, Bingqing Wang and Zhe Feng
  2. On Quick Kisses and How to Make Them Count: A Study on Event Construal in Light Verb Constructions with BERT
    Chenxin Liu and Emmanuele Chersoni
  3. Rigorously Assessing Natural Language Explanations of Neurons
    Jing Huang, Atticus Geiger, Karel D’Oosterlinck, Zhengxuan Wu and Christopher Potts
  4. Memory Injections: Correcting Multi-Hop Reasoning Failures During Inference in Transformer-Based Language Models
    Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard and Ian Foster
  5. NPIs Aren’t Exactly Easy: Variation in Licensing across Large Language Models
    Deanna DeCarlo, William Palmer, Michael Wilson and Bob Frank
  6. Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
    Abhijith Chintam, Rahel Beloch, Willem Zuidema, Michael Hanna and Oskar van der Wal

15:30 - 16:00 Break ☕

16:00 - 16:50 Invited talk by Antoine Bosselut

16:50 - 17:00 Closing remarks and awards

17:00 - 18:00 Panel discussion on Mechanistic Interpretability with:

Invited Speakers

Zhijing Jin

Zhijing Jin is a Ph.D. at Max Planck Institute & ETH. Her research focuses on socially responsible NLP via causal and moral principles. Specifically, she works on expanding the impact of NLP by promoting NLP for social good, and developing CausalNLP to improve robustness, fairness, and interpretability of NLP models, as well as analyze the causes of social problems.

Causal NLP: A Path towards Opening the Black Box of NLP

Recent advancements in large language models (LLMs) have demonstrated impressive performance in many tasks. However, the opaque nature of these NLP models often obscures the reasons behind their successes and unexpected failures. To interpret LLMs, early studies have identified correlations between embeddings and linguistic properties. In this talk, I will provide an overview of how causal analysis is being utilized in interpretability research for LLMs, highlighting its unique contributions to the field, as well as the open challenges. I will introduce the role of causality in two branches of interpretability: for behavioral studies of LLMs, I propose a causal framework that reformulates the robustness problem as a difference in the causal decision-making processes of humans versus those of the model; to understand the inner workings of LLMs, I will discuss how causal intervention and causal mediation analysis aid in unraveling the mechanisms of reasoning in LLMs, and how different mechanisms interact. I believe Causal NLP opens a unique pathway towards paving the way for more transparent, reliable, and responsible AI systems.

Antoine Bosselut

Antoine Bosselut is an assistant professor in the School of Computer and Communication Sciences at the École Polytechnique Fédéral de Lausanne (EPFL). He was a postdoctoral scholar at Stanford University and a Young Investigator at the Allen Institute for AI (AI2). He completed his PhD at the University of Washington and was a student researcher at Microsoft Research. His research interests are in building systems that mix knowledge and language representations to solve problems in NLP, specializing in commonsense representation and reasoning.

From Mechanistic Interpretability to Mechanistic Reasoning

Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. Despite this observation, our best methods for interpreting these representations yield few actionable insights on how to manipulate this parameter space for downstream benefit. In this talk, I will present work on methods that simulate machine reasoning by localizing and modifying parametric knowledge representations. First, I will present a method for discovering knowledge-critical subnetworks within pretrained language models, and show that these sparse computational subgraphs are responsible for the model’s ability to encode specific pieces of knowledge. Then, I will present a new reasoning algorithm, RECKONING, a bi-level optimisation procedure that dynamically encodes and reasons over new knowledge at test-time using the model’s existing learned knowledge representations as a scratchpad. Finally, I will discuss next steps and challenges in using internal model mechanisms for reasoning.

Panel Discussion on “Mechanistic Interpretability”

Panelists:

Important dates

All deadlines are 11:59pm UTC-12 (“anywhere on earth”).

Workshop description

Many recent performance improvements in NLP have come at the cost of understanding of the systems. How do we assess what representations and computations models learn? How do we formalize desirable properties of interpretable models, and measure the extent to which existing models achieve them? How can we build models that better encode these properties? What can new or existing tools tell us about these systems’ inductive biases?

The goal of this workshop is to bring together researchers focused on interpreting and explaining NLP models by taking inspiration from fields such as machine learning, psychology, linguistics, and neuroscience. We hope the workshop will serve as an interdisciplinary meetup that allows for cross-collaboration.

The topics of the workshop include, but are not limited to:

Feel free to reach out to the organizers at the email below if you are not sure whether a specific topic is well-suited for submission.

Call for Papers

We will accept submissions through Softconf at: https://www.softconf.com/emnlp2023/blackboxnlp2023/. All submissions should use the EMNLP 2023 template and formatting requirements specified by ACL. Archival paper must be fully anonymized. Submissions of both types can be made through Softconf.

Submission Types

Accepted submissions for both tracks will be presented at the workshop: most as posters, some as oral presentations (determined by the program committee).

Dual Submissions and Preprints

Dual submissions are allowed for the archival track, but please check the dual submissions policy for the other venue that you are dual-submitting to. Papers posted to preprint servers such as arXiv can be submitted without any restrictions on when they were posted.

Camera-ready information

Authors of accepted archival papers should upload the final version of their paper to the submission system by the camera-ready deadline. Authors may use one extra page to address reviewer comments, for a total of nine pages + references. Broader Impacts/Ethics and Limitations sections are optional and can be included on a 10th page.

Contact

Please contact the organizers at blackboxnlp@googlegroups.com for any questions.

Previous workshops

Sponsors

Organizers

You can reach the organizers by e-mail to blackboxnlp@googlegroups.com.

Yonatan Belinkov

Yonatan Belinkov is an assistant professor at the Technion. He has previously been a Postdoctoral Fellow at Harvard and MIT. His recent research focuses on interpretability and robustness of neural network models of language. His research has been published at leading NLP and ML venues. His PhD dissertation at MIT analyzed internal language representations in deep learning models. He has been awarded the Harvard Mind, Brain, and Behavior Postdoctoral Fellowship and the Azrieli Early Career Faculty Fellowship. He co-organised BlackboxNLP in 2019, 2020, and 2021, as well as the 1st and 2nd machine translation robustness tasks at WMT.

Najoung Kim

Najoung Kim is an Assistant Professor at the Department of Linguistics at Boston University. She is currently visting Google Research part-time. She is interested in studying meaning in both human and machine learners, especially ways in which they generalize to novel inputs and ways in which they treat implicit meaning. Her research has been published in various NLP venues including ACL and EMNLP. She was a co-organizer of the Inverse Scaling Competition, and a senior area chair for ACL 2023.

Sophie Hao

Sophie Hao is a Faculty Fellow at the Center for Data Science at New York University. She recently completed her PhD in Linguistics and Computer Science at Yale University, where her dissertation work focused on applications of feature attribution methods to NLP. More generally, Sophie is interested in theories of interpretation and explanation and how such theories can guide our usage and evaluation of analysis methods for black-box models. She is a frequent contributor to BlackboxNLP, having presented at its first three editions.

Arya McCarthy

Arya McCarthy is a Research Scientist at Scaled Cognition. He previously was a Ph.D. student in the Center for Language and Speech Processing at the Johns Hopkins University. He is supported by an Amazon Fellowship, a Frederick Jelinek Fellowship, and the International Olympic Committee. He investigates pan-lingual weak supervision for building core NLP tools and datasets in new languages; recent work has investigated limitations of large language models on simple transduction tasks. His research has been published in ACL, EMNLP, CoNLL, COLING, ICLR, and ICASSP.

Jaap Jumelet

Jaap Jumelet is a PhD candidate at the Institute for Logic, Language and Computation at the University of Amsterdam. His research focuses on gaining an understanding of how neural models are able to build up hierarchical representations of their input, by leveraging hypotheses from (psycho-)linguistics. His research has been published at leading NLP venues, including TACL, ACL, and CoNLL. His first ever paper was presented at the first BlackboxNLP workshop in 2018, and he has since presented work at each subsequent edition of the workshop.

Hosein Mohebbi

Hosein Mohebbi is a PhD candidate at the Department of Cognitive Science and Artificial Intelligence at Tilburg University. He is part of the InDeep consortium, working on analyzing and interpreting deep neural language and speech models. During his Master’s (2019-2021), he mainly focused on interpretability and accelerating inference of pre-trained language models. His research has been published in NLP venues such as ACL, EACL, EMNLP and BlackboxNLP.

Anti-Harassment Policy

BlackboxNLP 2023 adheres to the ACL Anti-Harassment Policy.