Publications

Home / Publications

Selected Publications

aug 2025

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, a new benchmark for transcription into dialect and standard for three dialect groups in Southeast Germany.
aug 2024

“My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

This paper investigates to what extent the first token probabilities of large language models match their final answers to multiple-choice questions.
may 2023

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

This paper provides an overview of more than 80 corpora to support NLP research in resource-poor and non-standardized languages of the Germanic language family.

All Publications

nov 2025

RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs

Testoni, Alberto and Plank, Barbara and Fernández, Raquel

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze

Alacam, Özge and Hoeken, Sanne and Säuberli, Andreas and Gröner, Hannes and Frassinelli, Diego and Zarrieß, Sina and Plank, Barbara

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It

Bertolazzi, Leonardo and Mondorf, Philipp and Plank, Barbara and Bernardi, Raffaella

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Chen, Beiduo and Liu, Yang Janet and Korhonen, Anna and Plank, Barbara

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference

Hong, Pingjun and Chen, Beiduo and Peng, Siyao and de Marneffe, Marie-Catherine and Plank, Barbara

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

Relevant for the Right Reasons? Investigating Lexical Biases in Zero-Shot and Instruction-Tuned Rerankers

Mao, Yuchen and Plank, Barbara and Litschko, Robert

Proceedings of the 5th Workshop on Multilingual Representation Learning (MRL 2025)
nov 2025

LeWiDi-2025 at NLPerspectives: Third Edition of the Learning with Disagreements Shared Task

Leonardelli, Elisa and Casola, Silvia and Peng, Siyao and Rizzi, Giulia and Basile, Valerio and Fersini, Elisabetta and Frassinelli, Diego and Jang, Hyewon and Pavlovic, Maja and Plank, Barbara and Poesio, Massimo

Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
nov 2025

Tracing Multilingual Factual Knowledge Acquisition in Pretraining

Liu, Yihong and Wang, Mingyang and Kargaran, Amir Hossein and Körner, Felicia and Nie, Ercong and Plank, Barbara and Yvon, François and Schuetze, Hinrich

Findings of the Association for Computational Linguistics: EMNLP 2025
nov 2025

Reason to Rote: Rethinking Memorization in Reasoning

Du, Yupei and Mondorf, Philipp and Casola, Silvia and Yao, Yuekun and Litschko, Robert and Plank, Barbara

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis

Wu, ChengYan and Ma, Bolei and Liu, Yihong and Zhang, Zheyu and Deng, Ningyuan and Li, Yanshu and Chen, Baolan and Zhang, Yi and Xue, Yun and Plank, Barbara

Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
nov 2025

Aligning NLP Models with Target Population Perspectives using PAIR: Population-Aligned Instance Replication

Eckman, Stephanie and Ma, Bolei and Kern, Christoph and Chew, Rob and Plank, Barbara and Kreuter, Frauke

Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
nov 2025

BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods

Mondorf, Philipp and Wang, Mingyang and Gerstner, Sebastian and Hakimi, Ahmad Dawar and Liu, Yihong and Veloso, Leonor and Zhou, Shijia and Schuetze, Hinrich and Plank, Barbara

Proceedings of the 8th BlackboxNLP Workshop: Analyzing and Interpreting Neural Networks for NLP
nov 2025

Make Every Letter Count: Building Dialect Variation Dictionaries from Monolingual Corpora

Litschko, Robert and Blaschke, Verena and Burkhardt, Diana and Plank, Barbara and Frassinelli, Diego

Findings of the Association for Computational Linguistics: EMNLP 2025
aug 2025

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

Blaschke, Verena and Winkler, Miriam and Förster, Constantin and Wenger-Glemser, Gabriele and Plank, Barbara

Interspeech 2025
jul 2025

Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study

Ma, Bolei and Yoztyurk, Berk and Haensch, Anna-Carolina and Wang, Xinpeng and Herklotz, Markus and Kreuter, Frauke and Plank, Barbara and Aßenmacher, Matthias

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
jul 2025

Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges

Ma, Bolei and Li, Yuting and Zhou, Wei and Gong, Ziwei and Liu, Yang Janet and Jasinskaja, Katja and Friedrich, Annemarie and Hirschberg, Julia and Kreuter, Frauke and Plank, Barbara

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
jul 2025

A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI

Chen, Beiduo and Peng, Siyao and Korhonen, Anna and Plank, Barbara

Findings of the Association for Computational Linguistics: ACL 2025
jul 2025

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

Mondorf, Philipp and Wold, Sondre and Plank, Barbara

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
jul 2025

Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set

Eichin, Florian and Liu, Yang Janet and Plank, Barbara and Hedderich, Michael A.

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
jul 2025

What’s the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

Hedderich, Michael A. and Wang, Anyi and Zhao, Raoyuan and Eichin, Florian and Fischer, Jonas and Plank, Barbara

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
jul 2025

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Bavaresco, Anna and Bernardi, Raffaella and Bertolazzi, Leonardo and Elliott, Desmond and Fernández, Raquel and Gatt, Albert and Ghaleb, Esam and Giulianelli, Mario and Hanna, Michael and Koller, Alexander and Martins, Andre and Mondorf, Philipp and Neplenbroek, Vera and Pezzelle, Sandro and Plank, Barbara and Schlangen, David and Suglia, Alessandro and Surikuchi, Aditya K and Takmaz, Ece and Testoni, Alberto

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
jul 2025

Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?

Säuberli, Andreas and Frassinelli, Diego and Plank, Barbara

Proceedings of the 20th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2025)
jul 2025

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

Mondorf, Philipp and Wold, Sondre and Plank, Barbara

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
jul 2025

LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Bavaresco, Anna and Bernardi, Raffaella and Bertolazzi, Leonardo and Elliott, Desmond and Fernández, Raquel and Gatt, Albert and Ghaleb, Esam and Giulianelli, Mario and Hanna, Michael and Koller, Alexander and Martins, Andre and Mondorf, Philipp and Neplenbroek, Vera and Pezzelle, Sandro and Plank, Barbara and Schlangen, David and Suglia, Alessandro and Surikuchi, Aditya K and Takmaz, Ece and Testoni, Alberto

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
apr 2025

Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum

Shim, Ryan Soh-Eun and Plank, Barbara

Findings of the Association for Computational Linguistics: NAACL 2025
apr 2025

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

Wang, Xinpeng and Hu, Chengzhi and Röttger, Paul and Plank, Barbara

The Thirteenth International Conference on Learning Representations
apr 2025

Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum

Shim, Ryan Soh-Eun and Plank, Barbara

Findings of the Association for Computational Linguistics: NAACL 2025
jan 2025

Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

Litschko, Robert and Kraus, Oliver and Blaschke, Verena and Plank, Barbara

Proceedings of the 31st International Conference on Computational Linguistics
jan 2025

Evaluating Pixel Language Models on Non-Standardized Languages

Muñoz-Ortiz, Alberto and Blaschke, Verena and Plank, Barbara

Proceedings of the 31st International Conference on Computational Linguistics
jan 2025

KARRIEREWEGE: A large scale Career Path Prediction Dataset

Senger, Elena and Campbell, Yuri and van der Goot, Rob and Plank, Barbara

Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
jan 2025

Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection

Blaschke, Verena and Körner, Felicia and Plank, Barbara

Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
jan 2025

Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study

Krückl, Xaver Maria and Blaschke, Verena and Plank, Barbara

Proceedings of the 12th Workshop on NLP for Similar Languages, Varieties and Dialects
nov 2024

Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

Mondorf, Philipp and Plank, Barbara

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
nov 2024

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Ma, Bolei and Wang, Xinpeng and Hu, Tiancheng and Haensch, Anna-Carolina and Hedderich, Michael A. and Plank, Barbara and Kreuter, Frauke

Findings of the Association for Computational Linguistics: EMNLP 2024
nov 2024

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity

Sedova, Anastasiia and Litschko, Robert and Frassinelli, Diego and Roth, Benjamin and Plank, Barbara

Findings of the Association for Computational Linguistics: EMNLP 2024
nov 2024

“Seeing the Big through the Small”: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

Chen, Beiduo and Wang, Xinpeng and Peng, Siyao and Litschko, Robert and Korhonen, Anna and Plank, Barbara

Findings of the Association for Computational Linguistics: EMNLP 2024
nov 2024

GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains

Liu, Yang Janet and Aoyama, Tatsuya and Scivetti, Wesley and Zhu, Yilun and Behzad, Shabnam and Levine, Lauren Elizabeth and Lin, Jessica and Tiwari, Devika and Zeldes, Amir

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
oct 2024

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey

Mondorf, Philipp and Plank, Barbara

First Conference on Language Modeling
oct 2024

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

Wang, Xinpeng and Hu, Chengzhi and Ma, Bolei and Rottger, Paul and Plank, Barbara

First Conference on Language Modeling
aug 2024

VariErr NLI: Separating Annotation Error from Human Label Variation

Weber-Genzel, Leon and Peng, Siyao and De Marneffe, Marie-Catherine and Plank, Barbara

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
aug 2024

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

Mondorf, Philipp and Plank, Barbara

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
aug 2024

What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects

Blaschke, Verena and Purschke, Christoph and Schuetze, Hinrich and Plank, Barbara

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
aug 2024

Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification

Xu, Shanshan and T.y.s.s, Santosh and Ichim, Oana and Plank, Barbara and Grabmair, Matthias

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
aug 2024

“My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

Wang, Xinpeng and Ma, Bolei and Hu, Chengzhi and Weber-Genzel, Leon and Röttger, Paul and Kreuter, Frauke and Hovy, Dirk and Plank, Barbara

Findings of the Association for Computational Linguistics: ACL 2024
jun 2024

What’s wrong with your model? A Quantitative Analysis of Relation Classification

Bassignana, Elisa and van der Goot, Rob and Plank, Barbara

Proceedings of the 13th Joint Conference on Lexical and Computational Semantics (*SEM 2024)
jun 2024

MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

Zhou, Shijia and Shan, Huangyan and Plank, Barbara and Litschko, Robert

Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
may 2024

Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants

Winkler, Miriam and Juozapaityte, Virginija and van der Goot, Rob and Plank, Barbara

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
may 2024

Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

Peng, Siyao and Sun, Zihang and Shan, Huangyan and Kolm, Marie and Blaschke, Verena and Artemova, Ekaterina and Plank, Barbara

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
may 2024

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

Blaschke, Verena and Kovačić, Barbara and Peng, Siyao and Schütze, Hinrich and Plank, Barbara

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
may 2024

IndirectQA: Understanding Indirect Answers to Implicit Polar Questions in French and Spanish

Müller, Christin and Plank, Barbara

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
may 2024

How to Encode Domain Information in Relation Classification

Bassignana, Elisa and Gascou, Viggo Unmack and Laustsen, Frida Nøhr and Kristensen, Gustav and Petersen, Marie Haahr and van der Goot, Rob and Plank, Barbara

Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
mar 2024

EEVEE: An Easy Annotation Tool for Natural Language Processing

Sorensen, Axel and Peng, Siyao and Plank, Barbara and Van Der Goot, Rob

Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
mar 2024

More Labels or Cases? Assessing Label Variation in Natural Language Inference

Gruber, Cornelia and Hechinger, Katharina and Assenmacher, Matthias and Kauermann, Göran and Plank, Barbara

Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language
mar 2024

Rethinking Skill Extraction in the Job Market Domain using Large Language Models

Nguyen, Khanh and Zhang, Mike and Montariol, Syrielle and Bosselut, Antoine

Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)
mar 2024

Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings

Senger, Elena and Zhang, Mike and Goot, Rob and Plank, Barbara

Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)
mar 2024

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

Peng, Siyao and Sun, Zihang and Loftus, Sebastian and Plank, Barbara

Proceedings of the Third Workshop on Understanding Implicit and Underspecified Language
mar 2024

Entity Linking in the Job Market Domain

Zhang, Mike and Goot, Rob and Plank, Barbara

Findings of the Association for Computational Linguistics: EACL 2024
mar 2024

Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

Baan, Joris and Fernández, Raquel and Plank, Barbara and Aziz, Wilker

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 2: Short Papers)
mar 2024

Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties

Artemova, Ekaterina and Blaschke, Verena and Plank, Barbara

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
mar 2024

Donkii: Characterizing and Detecting Errors in Instruction-Tuning Datasets

Weber, Leon and Litschko, Robert and Artemova, Ekaterina and Plank, Barbara

Proceedings of The 18th Linguistic Annotation Workshop (LAW-XVIII)
mar 2024

JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching

Magron, Antoine and Dai, Anna and Zhang, Mike and Montariol, Syrielle and Bosselut, Antoine

Proceedings of the First Workshop on Natural Language Processing for Human Resources (NLP4HR 2024)
mar 2024

NNOSE: Nearest Neighbor Occupational Skill Extraction

Zhang, Mike and van der Goot, Rob and Kan, Min-Yen and Plank, Barbara

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
dec 2023

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

Müller-Eberstein, Max and van der Goot, Rob and Plank, Barbara and Titov, Ivan

Findings of the Association for Computational Linguistics: EMNLP 2023
dec 2023

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

Giulianelli, Mario and Baan, Joris and Aziz, Wilker and Fernández, Raquel and Plank, Barbara

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP)
dec 2023

ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation

Wang, Xinpeng and Plank, Barbara

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
dec 2023

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

Litschko, Robert and Müller-Eberstein, Max and van der Goot, Rob and Weber-Genzel, Leon and Plank, Barbara

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
dec 2023

From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification

Xu, Shanshan and T.y.s.s, Santosh and Ichim, Oana and Risini, Isabella and Plank, Barbara and Grabmair, Matthias

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
jul 2023

Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

Litschko, Robert and Artemova, Ekaterina and Plank, Barbara

Findings of the Association for Computational Linguistics: ACL 2023
jul 2023

SemEval-2023 Task 11: Learning with Disagreements (LeWiDi)

Leonardelli, Elisa and Abercrombie, Gavin and Almanea, Dina and Basile, Valerio and Fornaciari, Tommaso and Plank, Barbara and Rieser, Verena and Uma, Alexandra and Poesio, Massimo

Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)
jul 2023

ActiveAED: A Human in the Loop Improves Annotation Error Detection

Weber, Leon and Plank, Barbara

Findings of the Association for Computational Linguistics: ACL 2023
jul 2023

Silver Syntax Pre-training for Cross-Domain Relation Extraction

Bassignana, Elisa and Ginter, Filip and Pyysalo, Sampo and van der Goot, Rob and Plank, Barbara

Findings of the Association for Computational Linguistics: ACL 2023
jul 2023

How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

Wang, Xinpeng and Weissweiler, Leonie and Schütze, Hinrich and Plank, Barbara

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
jul 2023

ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain

Zhang, Mike and van der Goot, Rob and Plank, Barbara

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
may 2023

Low-resource Bilingual Dialect Lexicon Induction with Large Language Models

Artemova, Ekaterina and Plank, Barbara

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
may 2023

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

Blaschke, Verena and Schuetze, Hinrich and Plank, Barbara

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
may 2023

Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction

Bassignana, Elisa and Ginter, Filip and Pyysalo, Sampo and van der Goot, Rob and Plank, Barbara

Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)
may 2023

Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages

Blaschke, Verena and Schütze, Hinrich and Plank, Barbara

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
may 2023

Findings of the VarDial Evaluation Campaign 2023

Aepli, Noëmi and Çöltekin, Çağrı and Van Der Goot, Rob and Jauhiainen, Tommi and Kazzaz, Mourhaf and Ljubešić, Nikola and North, Kai and Plank, Barbara and Scherrer, Yves and Zampieri, Marcos

Tenth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2023)
dec 2022

Experimental Standards for Deep Learning in Natural Language Processing Research

Ulmer, Dennis and Bassignana, Elisa and Müller-Eberstein, Max and Varab, Daniel and Zhang, Mike and van der Goot, Rob and Hardmeier, Christian and Plank, Barbara

Findings of the Association for Computational Linguistics: EMNLP 2022
dec 2022

Spectral Probing

Müller-Eberstein, Max and van der Goot, Rob and Plank, Barbara

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
dec 2022

The “Problem” of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

Plank, Barbara

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
dec 2022

Stop Measuring Calibration When Humans Disagree

Baan, Joris and Aziz, Wilker and Plank, Barbara and Fernandez, Raquel

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
dec 2022

Evidence > Intuition: Transferability Estimation for Encoder Selection

Bassignana, Elisa and Müller-Eberstein, Max and Zhang, Mike and Plank, Barbara

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
dec 2022

CrossRE: A Cross-Domain Dataset for Relation Extraction

Bassignana, Elisa and Plank, Barbara

Findings of the Association for Computational Linguistics: EMNLP 2022

Preprints

2026

Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

Chen, Beiduo and Hu, Tiancheng and Zhang, Caiqi and Litschko, Robert and Korhonen, Anna and Plank, Barbara

arXiv preprint arXiv:2601.03154
2025

Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

Blaschke, Verena and Winkler, Miriam and Plank, Barbara

arXiv preprint arXiv:2510.07890
2025

Agree, Disagree, Explain: Decomposing Human Label Variation in NLI through the Lens of Explanations

Hong, Pingjun and Chen, Beiduo and Peng, Siyao and de Marneffe, Marie-Catherine and Roth, Benjamin and Plank, Barbara

arXiv preprint arXiv:2510.164580
2025

Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation

Ma, Bolei and Cao, Yong and Sen, Indira and Haensch, Anna-Carolina and Kreuter, Frauke and Plank, Barbara and Hershcovich, Daniel

arXiv preprint arXiv:2510.13884
2025

If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models

Orth, Jasmin and Mondorf, Philipp and Plank, Barbara

arXiv preprint arXiv:2510.08388
2025

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

Wang, Xinpeng and Joshi, Nitish and Plank, Barbara and Angell, Rico and He, He

arXiv preprint arXiv:2510.01367
2025

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

Eichin, Florian and Du, Yupei and Mondorf, Philipp and Matveev, Maria and Plank, Barbara and Hedderich, Michael A.

arXiv preprint arXiv:2505.20076
2025

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically

Shim, Ryan Soh-Eun and Cristofaro, Domenico De and Hu, Chengzhi Martin and Vietti, Alessandro and Plank, Barbara

arXiv preprint arXiv:2505.19606
2025

Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning

Mondorf, Philipp and Zhou, Shijia and Riedler, Monica and Plank, Barbara

arXiv preprint arXiv:2504.01445
2025

Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior

Si, Shengyun and Wang, Xinpeng and Zhai, Guangyao and Navab, Nassir and Plank, Barbara

arXiv preprint arXiv:2503.17882
2024

Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination

Chen, Qiqi and Wang, Xinpeng and Mondorf, Philipp and Hedderich, Michael A. and Plank, Barbara

arXiv preprint arXiv:2410.17820
2023

Uncertainty in Natural Language Generation: From Theory to Applications

Baan, Joris and Daheim, Nico and Ilia, Evgenia and Ulmer, Dennis and Li, Haau-Sing and Fernández, Raquel and Plank, Barbara and Sennrich, Rico and Zerva, Chrysoula and Aziz, Wilker

arXiv preprint arXiv:2307.15703

Publications

Selected Publications

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

“My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

All Publications

Preprints