BlackboxNLP 2026

Call for Papers

Overview

BlackboxNLP 2026 invites the submission of archival and non-archival papers featuring original and unpublished research on interpreting and explaining NLP models by taking inspiration from fields such as machine learning, psychology, linguistics, and neuroscience. We hope the workshop will serve as an interdisciplinary meetup that allows for cross-collaboration.

The topics of the workshop include, but are not limited to:

Adapting and applying analysis techniques from other disciplines, such as neuroscience, to analyze high-dimensional vector representations in artificial neural networks.
Examining model performance on simplified or formal languages.
Proposing architectural modifications to increase models’ interpretability.
Testing if interpretable information can be decoded from internal representations.
Open-source tools for analysis, visualization, or explanation to democratize access to interpretability techniques in NLP.
Meta-evaluation of analysis methods to assess their validity.
Understanding how and when language models rely on context information.
Analysing the linguistic properties captured by contextualised word representations.
Scaling up analysis methods for large language models (LLMs).
Mechanistic interpretability, reverse engineering approaches to understanding particular properties of neural models.
Evaluation of techniques for steering LLM output behavior.
Uncovering the reasoning processes of LLMs.
Understanding under the hood of memorization in LLMs.
Insights into LLM Failures.
Translating interpretability insights into practical solutions to address key challenges in NLP.
Opinion pieces about the state of interpretability in NLP.

Special Track: Reproducibility and Reliability in Interpretability Analyses

Recent work has raised critical concerns about the significance and reproducibility of widely-reported interpretability results, suggesting that popular analysis methods can yield plausible-looking explanations even when applied to randomly initialized neural networks. Relatedly, “interpretability illusions” observed in analyses of models like BERT and GPT-2 suggest that some interesting phenomena might be limited in scope to specific datasets or models. While such incremental work is typically challenging to publish, we believe it is of great service to the interpretability research community. To promote high standards for the quality of new studies in this field, we introduce a special track inviting submissions of max. 6 pages focused on reproducing established interpretability results and complementing them with, e.g., rigorous statistical evaluations to assess the magnitude of reported effects, or additional experiments on previously untested models and datasets. We encourage, in particular, submissions that apply appropriate controls (e.g., random baselines), report effect sizes, and test generalization across datasets and model configurations.

Paper Submission Information

We will accept direct submissions or ARR commitments through OpenReview. All submissions should use the *ACL template and formatting requirements, following the official EMNLP style guidelines. Archival paper must be fully anonymized.

Submission Types

Archival papers of up to 8 pages + references. These are papers reporting on completed, original, and unpublished research. Papers shorter than this maximum are also welcome. An optional appendix may appear after the references in the same pdf file. Accepted papers are expected to be presented at the workshop and will be published in the workshop proceedings of the ACL Anthology, meaning they cannot be published elsewhere. They should report on obtained results rather than intended work. These papers will undergo double-blind peer-review, and should thus be anonymized.
Non-archival extended abstracts of 2 pages + references. These may report on work in progress or may be cross-submissions that have already appeared (or are scheduled to appear) in another venue. These submissions are non-archival and will not be included in the proceedings. The selection will not be based on a double-blind review and thus submissions of this type need not be anonymized.
Special track papers of up to 6 pages + references. These are papers focused on reproducing established interpretability results. See the Special Track section above for details. These papers will undergo double-blind peer-review, and should thus be anonymized.

Accepted submissions for all tracks will be presented at the workshop: most as posters, some as oral presentations (determined by the program committee).

Important dates

July 17th, 2026 - Direct paper submission deadline.
August 28th, 2026 - ARR Pre-reviewed paper commitment deadline.
September 8th, 2026 - Notification of acceptance.
September 20th, 2026 - Camera ready due.
October 28th or 29th, 2026 - Workshop date.

Dual Submissions and Preprints

Dual submissions are allowed for the archival track, but please check the dual submissions policy for the other venue that you are dual-submitting to. Papers posted to preprint servers such as arXiv can be submitted without any restrictions on when they were posted.

Camera-ready information

Authors of accepted archival papers should upload the final version of their paper to the submission system by the camera-ready deadline. Authors may use one extra page to address reviewer comments, for a total of nine pages (for full papers) or six pages (for special track papers), plus unlimited space for references and appendices. Broader Impacts/Ethics and Limitations sections are optional and can be included on an additional page.

Contact

Please contact the organizers at blackboxnlp@googlegroups.com for any questions.

Anti-Harassment Policy

BlackboxNLP 2026 adheres to the ACL Anti-Harassment Policy.