Logo

Analyzing and interpreting neural networks for NLP

Workshop on analyzing and interpreting neural networks for NLP

BlackboxNLP 2024

The seventh edition of BlackboxNLP will be co-located with EMNLP, in Miami on November 15, 2024.

News

Programme

9:00 - 9:10 Opening remarks

9:10 - 10:00 Invited talk by Jack Merullo

10:00 - 10:30 oral presentations:

  1. Routing in Sparsely-gated Language Models responds to Context
    Stefan Arnold, Marian Fietta, and Dilara Yesilbas
  2. Log Probabilities Are a Reliable Estimate of Semantic Plausibility in Base and Instruction-Tuned Language Models
    Carina Kauf, Emmanuele Chersoni, Alessandro Lenci, Evelina Fedorenko, and Anna A Ivanova

10:30 - 11:00 Break ☕

11:00 - 12:30 In-person & virtual poster session 1.

12:30 - 14:00 Lunch 🥪

14:00 - 15:00 Invited talk by Himabindu Lakkaraju

15:00 - 15:30 oral presentations:

  1. Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
    Tom Lieberum, Senthooran Rajamanoharan, Arthur Conmy, Lewis Smith, Nicolas Sonnerat, Vikrant Varma, Janos Kramar, Anca Dragan, Rohin Shah, and Neel Nanda
  2. Mechanistic?
    Naomi Saphra and Sarah Wiegreffe

15:30 - 16:00 Break ☕

15:30 - 16:30 In-person poster session 2

16:30 - 16:40 Closing remarks and awards

17:00 - 18:00 Panel discussion on Interpretability with:

Invited Speakers

Jack Merullo

Himabindu Lakkaraju

Panel Discussion on “Interpretability”

Panelists:

Important dates

All deadlines are 11:59PM UTC-12:00 (“Anywhere on Earth”).

Workshop description

Many recent performance improvements in NLP have come at the cost of understanding of the systems. How do we assess what representations and computations models learn? How do we formalize desirable properties of interpretable models, and measure the extent to which existing models achieve them? How can we build models that better encode these properties? What can new or existing tools tell us about these systems’ inductive biases?

The goal of this workshop is to bring together researchers focused on interpreting and explaining NLP models by taking inspiration from fields such as machine learning, psychology, linguistics, and neuroscience. We hope the workshop will serve as an interdisciplinary meetup that allows for cross-collaboration.

The topics of the workshop include, but are not limited to:

Feel free to reach out to the organizers at the email below if you are not sure whether a specific topic is well-suited for submission.

Call for Papers

We will accept submissions through OpenReview (submission link TBA). All submissions should use the ACL template and formatting requirements specified by ACL. Archival paper must be fully anonymized. Submissions of both types can be made through OpenReview (submission link).

Submission Types

Submissions should follow the official EMNLP 2024 style guidelines. Accepted submissions for both tracks will be presented at the workshop: most as posters, some as oral presentations (determined by the program committee).

Dual Submissions and Preprints

Dual submissions are allowed for the archival track, but please check the dual submissions policy for the other venue that you are dual-submitting to. Papers posted to preprint servers such as arXiv can be submitted without any restrictions on when they were posted.

Camera-ready information

Authors of accepted archival papers should upload the final version of their paper to the submission system by the camera-ready deadline. Authors may use one extra page to address reviewer comments, for a total of nine pages + references. Broader Impacts/Ethics and Limitations sections are optional and can be included on a 10th page.

Contact

Please contact the organizers at blackboxnlp@googlegroups.com for any questions.

Previous workshops

Sponsors

Organizers

You can reach the organizers by e-mail to blackboxnlp@googlegroups.com.

Yonatan Belinkov

Yonatan Belinkov is an assistant professor at the Technion. He has previously been a Postdoctoral Fellow at Harvard and MIT. His recent research focuses on interpretability and robustness of neural network models of language. His research has been published at leading NLP and ML venues. His PhD dissertation at MIT analyzed internal language representations in deep learning models. He has been awarded the Harvard Mind, Brain, and Behavior Postdoctoral Fellowship and the Azrieli Early Career Faculty Fellowship. He co-organised BlackboxNLP in 2019, 2020, and 2021, as well as the 1st and 2nd machine translation robustness tasks at WMT.

Najoung Kim

Najoung Kim is an Assistant Professor at the Department of Linguistics at Boston University. She is currently visting Google Research part-time. She is interested in studying meaning in both human and machine learners, especially ways in which they generalize to novel inputs and ways in which they treat implicit meaning. Her research has been published in various NLP venues including ACL and EMNLP. She was a co-organizer of the Inverse Scaling Competition, and a senior area chair for ACL 2023.

Jaap Jumelet

Jaap Jumelet is a PhD candidate at the Institute for Logic, Language and Computation at the University of Amsterdam. His research focuses on gaining an understanding of how neural models are able to build up hierarchical representations of their input, by leveraging hypotheses from (psycho-)linguistics. His research has been published at leading NLP venues, including TACL, ACL, and CoNLL. His first ever paper was presented at the first BlackboxNLP workshop in 2018, and he has since presented work at each subsequent edition of the workshop.

Hosein Mohebbi

Hosein Mohebbi is a PhD candidate at the Department of Cognitive Science and Artificial Intelligence at Tilburg University, Netherlands. He is part of the InDeep consortium project, doing research on the interpretability of deep neural models for text and speech. His research has been published in leading NLP venues such as ACL, EACL, and EMNLP, where he also regularly serves as a reviewer. He received an Outstanding Paper Award at EMNLP 2023. His contribution to the CL community extends to co-organizing the previous edition of BlackboxNLP and offering a tutorial at EACL 2024 conference.

Aaron Mueller

Aaron Mueller is a postdoctoral fellow at Northeastern University and the Technion. He recently obtained his PhD from Johns Hopkins University in 2023. His work takes inspiration from psycholinguistics and causal interpretability to evaluate and improve the robustness and mechanistic reasoning of NLP systems. His work has been published at leading NLP venues including ACL, EMNLP, and NAACL. He has received the Zuckerman postdoctoral fellowship, and coverage in the New York Times as a co-organizer of the BabyLM Challenge.

Hanjie Chen

Hanjie Chen is an incoming Assistant Professor in the Department of Computer Science at Rice University. She currently works as a Postdoctoral Fellow in the Center for Language and Speech Processing at Johns Hopkins University. She obtained her Ph.D. in Computer Science in May 2023 at the University of Virginia. Hanjie is broadly interested in Trustworthy AI, Natural Language Processing, and Interpretable Machine Learning. Specifically, her research focuses on the interpretability and analysis of neural language models. She has published papers at leading AI/NLP venues, including ACL, AAAI, EMNLP, and NAACL. She has been honored with the Outstanding Doctoral Student Award, John A. Stankovic Graduate Research Award, Carlos and Esther Farrar Fellowship, and Graduate Teaching Awards at UVA. She also won the Best Poster Award at the ACM Capital Region Celebration of Women in Computing.

Anti-Harassment Policy

BlackboxNLP 2024 adheres to the ACL Anti-Harassment Policy.