aug 2025
Although Germany has a diverse landscape of dialects, they are underrepresented in current automatic speech recognition (ASR) research. To enable studies of how robust models are towards dialectal variation, we present Betthupferl, a new benchmark for transcription into dialect and standard for three dialect groups in Southeast Germany.
aug 2024
This paper investigates to what extent the first token probabilities of large language models match their final answers to multiple-choice questions.
may 2023
This paper provides an overview of more than 80 corpora to support NLP research in resource-poor and non-standardized languages of the Germanic language family.
nov 2025
RAcQUEt: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs
Testoni, Alberto and Plank, Barbara and Fernández, Raquel
nov 2025
Disentangling Subjectivity and Uncertainty for Hate Speech Annotation and Modeling using Gaze
Alacam, Özge and Hoeken, Sanne and Säuberli, Andreas and Gröner, Hannes and Frassinelli, Diego and Zarrieß, Sina and Plank, Barbara
nov 2025
The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It
Bertolazzi, Leonardo and Mondorf, Philipp and Plank, Barbara and Bernardi, Raffaella
nov 2025
Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation
Chen, Beiduo and Liu, Yang Janet and Korhonen, Anna and Plank, Barbara
nov 2025
LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference
Hong, Pingjun and Chen, Beiduo and Peng, Siyao and de Marneffe, Marie-Catherine and Plank, Barbara
nov 2025
Relevant for the Right Reasons? Investigating Lexical Biases in Zero-Shot and Instruction-Tuned Rerankers
Mao, Yuchen and Plank, Barbara and Litschko, Robert
nov 2025
LeWiDi-2025 at NLPerspectives: Third Edition of the Learning with Disagreements Shared Task
Leonardelli, Elisa and Casola, Silvia and Peng, Siyao and Rizzi, Giulia and Basile, Valerio and Fersini, Elisabetta and Frassinelli, Diego and Jang, Hyewon and Pavlovic, Maja and Plank, Barbara and Poesio, Massimo
nov 2025
Tracing Multilingual Factual Knowledge Acquisition in Pretraining
Liu, Yihong and Wang, Mingyang and Kargaran, Amir Hossein and Körner, Felicia and Nie, Ercong and Plank, Barbara and Yvon, François and Schuetze, Hinrich
nov 2025
Reason to Rote: Rethinking Memorization in Reasoning
Du, Yupei and Mondorf, Philipp and Casola, Silvia and Yao, Yuekun and Litschko, Robert and Plank, Barbara
nov 2025
M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis
Wu, ChengYan and Ma, Bolei and Liu, Yihong and Zhang, Zheyu and Deng, Ningyuan and Li, Yanshu and Chen, Baolan and Zhang, Yi and Xue, Yun and Plank, Barbara
nov 2025
Aligning NLP Models with Target Population Perspectives using PAIR: Population-Aligned Instance Replication
Eckman, Stephanie and Ma, Bolei and Kern, Christoph and Chew, Rob and Plank, Barbara and Kreuter, Frauke
nov 2025
BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods
Mondorf, Philipp and Wang, Mingyang and Gerstner, Sebastian and Hakimi, Ahmad Dawar and Liu, Yihong and Veloso, Leonor and Zhou, Shijia and Schuetze, Hinrich and Plank, Barbara
nov 2025
Make Every Letter Count: Building Dialect Variation Dictionaries from Monolingual Corpora
Litschko, Robert and Blaschke, Verena and Burkhardt, Diana and Plank, Barbara and Frassinelli, Diego
aug 2025
A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation
Blaschke, Verena and Winkler, Miriam and Förster, Constantin and Wenger-Glemser, Gabriele and Plank, Barbara
jul 2025
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study
Ma, Bolei and Yoztyurk, Berk and Haensch, Anna-Carolina and Wang, Xinpeng and Herklotz, Markus and Kreuter, Frauke and Plank, Barbara and Aßenmacher, Matthias
jul 2025
Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges
Ma, Bolei and Li, Yuting and Zhou, Wei and Gong, Ziwei and Liu, Yang Janet and Jasinskaja, Katja and Friedrich, Annemarie and Hirschberg, Julia and Kreuter, Frauke and Plank, Barbara
jul 2025
A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI
Chen, Beiduo and Peng, Siyao and Korhonen, Anna and Plank, Barbara
jul 2025
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Mondorf, Philipp and Wold, Sondre and Plank, Barbara
jul 2025
Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set
Eichin, Florian and Liu, Yang Janet and Plank, Barbara and Hedderich, Michael A.
jul 2025
What’s the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns
Hedderich, Michael A. and Wang, Anyi and Zhao, Raoyuan and Eichin, Florian and Fischer, Jonas and Plank, Barbara
jul 2025
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Bavaresco, Anna and Bernardi, Raffaella and Bertolazzi, Leonardo and Elliott, Desmond and Fernández, Raquel and Gatt, Albert and Ghaleb, Esam and Giulianelli, Mario and Hanna, Michael and Koller, Alexander and Martins, Andre and Mondorf, Philipp and Neplenbroek, Vera and Pezzelle, Sandro and Plank, Barbara and Schlangen, David and Suglia, Alessandro and Surikuchi, Aditya K and Takmaz, Ece and Testoni, Alberto
jul 2025
Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?
Säuberli, Andreas and Frassinelli, Diego and Plank, Barbara
jul 2025
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models
Mondorf, Philipp and Wold, Sondre and Plank, Barbara
jul 2025
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
Bavaresco, Anna and Bernardi, Raffaella and Bertolazzi, Leonardo and Elliott, Desmond and Fernández, Raquel and Gatt, Albert and Ghaleb, Esam and Giulianelli, Mario and Hanna, Michael and Koller, Alexander and Martins, Andre and Mondorf, Philipp and Neplenbroek, Vera and Pezzelle, Sandro and Plank, Barbara and Schlangen, David and Suglia, Alessandro and Surikuchi, Aditya K and Takmaz, Ece and Testoni, Alberto
apr 2025
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
Shim, Ryan Soh-Eun and Plank, Barbara
apr 2025
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Wang, Xinpeng and Hu, Chengzhi and Röttger, Paul and Plank, Barbara
apr 2025
Dialetto, ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum
Shim, Ryan Soh-Eun and Plank, Barbara
jan 2025
Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages
Litschko, Robert and Kraus, Oliver and Blaschke, Verena and Plank, Barbara
jan 2025
Evaluating Pixel Language Models on Non-Standardized Languages
Muñoz-Ortiz, Alberto and Blaschke, Verena and Plank, Barbara
jan 2025
KARRIEREWEGE: A large scale Career Path Prediction Dataset
Senger, Elena and Campbell, Yuri and van der Goot, Rob and Plank, Barbara
jan 2025
Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection
Blaschke, Verena and Körner, Felicia and Plank, Barbara
jan 2025
Improving Dialectal Slot and Intent Detection with Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study
Krückl, Xaver Maria and Blaschke, Verena and Plank, Barbara
nov 2024
Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models
Mondorf, Philipp and Plank, Barbara
nov 2024
The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models
Ma, Bolei and Wang, Xinpeng and Hu, Tiancheng and Haensch, Anna-Carolina and Hedderich, Michael A. and Plank, Barbara and Kreuter, Frauke
nov 2024
To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity
Sedova, Anastasiia and Litschko, Robert and Frassinelli, Diego and Roth, Benjamin and Plank, Barbara
nov 2024
“Seeing the Big through the Small”: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?
Chen, Beiduo and Wang, Xinpeng and Peng, Siyao and Litschko, Robert and Korhonen, Anna and Plank, Barbara
nov 2024
GDTB: Genre Diverse Data for English Shallow Discourse Parsing across Modalities, Text Types, and Domains
Liu, Yang Janet and Aoyama, Tatsuya and Scivetti, Wesley and Zhu, Yilun and Behzad, Shabnam and Levine, Lauren Elizabeth and Lin, Jessica and Tiwari, Devika and Zeldes, Amir
oct 2024
Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models - A Survey
Mondorf, Philipp and Plank, Barbara
oct 2024
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
Wang, Xinpeng and Hu, Chengzhi and Ma, Bolei and Rottger, Paul and Plank, Barbara
aug 2024
VariErr NLI: Separating Annotation Error from Human Label Variation
Weber-Genzel, Leon and Peng, Siyao and De Marneffe, Marie-Catherine and Plank, Barbara
aug 2024
Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning
Mondorf, Philipp and Plank, Barbara
aug 2024
What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects
Blaschke, Verena and Purschke, Christoph and Schuetze, Hinrich and Plank, Barbara
aug 2024
Through the Lens of Split Vote: Exploring Disagreement, Difficulty and Calibration in Legal Case Outcome Classification
Xu, Shanshan and T.y.s.s, Santosh and Ichim, Oana and Plank, Barbara and Grabmair, Matthias
aug 2024
“My Answer is C”: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
Wang, Xinpeng and Ma, Bolei and Hu, Chengzhi and Weber-Genzel, Leon and Röttger, Paul and Kreuter, Frauke and Hovy, Dirk and Plank, Barbara
jun 2024
What’s wrong with your model? A Quantitative Analysis of Relation Classification
Bassignana, Elisa and van der Goot, Rob and Plank, Barbara
jun 2024
MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness
Zhou, Shijia and Shan, Huangyan and Plank, Barbara and Litschko, Robert
may 2024
Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants
Winkler, Miriam and Juozapaityte, Virginija and van der Goot, Rob and Plank, Barbara
may 2024
Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data
Peng, Siyao and Sun, Zihang and Shan, Huangyan and Kolm, Marie and Blaschke, Verena and Artemova, Ekaterina and Plank, Barbara
may 2024
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank
Blaschke, Verena and Kovačić, Barbara and Peng, Siyao and Schütze, Hinrich and Plank, Barbara
may 2024
IndirectQA: Understanding Indirect Answers to Implicit Polar Questions in French and Spanish
Müller, Christin and Plank, Barbara
may 2024
How to Encode Domain Information in Relation Classification
Bassignana, Elisa and Gascou, Viggo Unmack and Laustsen, Frida Nøhr and Kristensen, Gustav and Petersen, Marie Haahr and van der Goot, Rob and Plank, Barbara
mar 2024
EEVEE: An Easy Annotation Tool for Natural Language Processing
Sorensen, Axel and Peng, Siyao and Plank, Barbara and Van Der Goot, Rob
mar 2024
More Labels or Cases? Assessing Label Variation in Natural Language Inference
Gruber, Cornelia and Hechinger, Katharina and Assenmacher, Matthias and Kauermann, Göran and Plank, Barbara
mar 2024
Rethinking Skill Extraction in the Job Market Domain using Large Language Models
Nguyen, Khanh and Zhang, Mike and Montariol, Syrielle and Bosselut, Antoine
mar 2024
Deep Learning-based Computational Job Market Analysis: A Survey on Skill Extraction and Classification from Job Postings
Senger, Elena and Zhang, Mike and Goot, Rob and Plank, Barbara
mar 2024
Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations
Peng, Siyao and Sun, Zihang and Loftus, Sebastian and Plank, Barbara
mar 2024
Entity Linking in the Job Market Domain
Zhang, Mike and Goot, Rob and Plank, Barbara
mar 2024
Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?
Baan, Joris and Fernández, Raquel and Plank, Barbara and Aziz, Wilker
mar 2024
Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties
Artemova, Ekaterina and Blaschke, Verena and Plank, Barbara
mar 2024
Donkii: Characterizing and Detecting Errors in Instruction-Tuning Datasets
Weber, Leon and Litschko, Robert and Artemova, Ekaterina and Plank, Barbara
mar 2024
JobSkape: A Framework for Generating Synthetic Job Postings to Enhance Skill Matching
Magron, Antoine and Dai, Anna and Zhang, Mike and Montariol, Syrielle and Bosselut, Antoine
mar 2024
NNOSE: Nearest Neighbor Occupational Skill Extraction
Zhang, Mike and van der Goot, Rob and Kan, Min-Yen and Plank, Barbara
dec 2023
Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training
Müller-Eberstein, Max and van der Goot, Rob and Plank, Barbara and Titov, Ivan
dec 2023
What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability
Giulianelli, Mario and Baan, Joris and Aziz, Wilker and Fernández, Raquel and Plank, Barbara
dec 2023
ACTOR: Active Learning with Annotator-specific Classification Heads to Embrace Human Label Variation
Wang, Xinpeng and Plank, Barbara
dec 2023
Establishing Trustworthiness: Rethinking Tasks and Model Evaluation
Litschko, Robert and Müller-Eberstein, Max and van der Goot, Rob and Weber-Genzel, Leon and Plank, Barbara
dec 2023
From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification
Xu, Shanshan and T.y.s.s, Santosh and Ichim, Oana and Risini, Isabella and Plank, Barbara and Grabmair, Matthias
jul 2023
Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data
Litschko, Robert and Artemova, Ekaterina and Plank, Barbara
jul 2023
SemEval-2023 Task 11: Learning with Disagreements (LeWiDi)
Leonardelli, Elisa and Abercrombie, Gavin and Almanea, Dina and Basile, Valerio and Fornaciari, Tommaso and Plank, Barbara and Rieser, Verena and Uma, Alexandra and Poesio, Massimo
jul 2023
ActiveAED: A Human in the Loop Improves Annotation Error Detection
Weber, Leon and Plank, Barbara
jul 2023
Silver Syntax Pre-training for Cross-Domain Relation Extraction
Bassignana, Elisa and Ginter, Filip and Pyysalo, Sampo and van der Goot, Rob and Plank, Barbara
jul 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Wang, Xinpeng and Weissweiler, Leonie and Schütze, Hinrich and Plank, Barbara
jul 2023
ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain
Zhang, Mike and van der Goot, Rob and Plank, Barbara
may 2023
Low-resource Bilingual Dialect Lexicon Induction with Large Language Models
Artemova, Ekaterina and Plank, Barbara
may 2023
A Survey of Corpora for Germanic Low-Resource Languages and Dialects
Blaschke, Verena and Schuetze, Hinrich and Plank, Barbara
may 2023
Multi-CrossRE A Multi-Lingual Multi-Domain Dataset for Relation Extraction
Bassignana, Elisa and Ginter, Filip and Pyysalo, Sampo and van der Goot, Rob and Plank, Barbara
may 2023
Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages
Blaschke, Verena and Schütze, Hinrich and Plank, Barbara
may 2023
Findings of the VarDial Evaluation Campaign 2023
Aepli, Noëmi and Çöltekin, Çağrı and Van Der Goot, Rob and Jauhiainen, Tommi and Kazzaz, Mourhaf and Ljubešić, Nikola and North, Kai and Plank, Barbara and Scherrer, Yves and Zampieri, Marcos
dec 2022
Experimental Standards for Deep Learning in Natural Language Processing Research
Ulmer, Dennis and Bassignana, Elisa and Müller-Eberstein, Max and Varab, Daniel and Zhang, Mike and van der Goot, Rob and Hardmeier, Christian and Plank, Barbara
dec 2022
Spectral Probing
Müller-Eberstein, Max and van der Goot, Rob and Plank, Barbara
dec 2022
dec 2022
Stop Measuring Calibration When Humans Disagree
Baan, Joris and Aziz, Wilker and Plank, Barbara and Fernandez, Raquel
dec 2022
Evidence > Intuition: Transferability Estimation for Encoder Selection
Bassignana, Elisa and Müller-Eberstein, Max and Zhang, Mike and Plank, Barbara
dec 2022
CrossRE: A Cross-Domain Dataset for Relation Extraction
Bassignana, Elisa and Plank, Barbara
2026
Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective
Chen, Beiduo and Hu, Tiancheng and Zhang, Caiqi and Litschko, Robert and Korhonen, Anna and Plank, Barbara
2025
Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects
Blaschke, Verena and Winkler, Miriam and Plank, Barbara
2025
Agree, Disagree, Explain: Decomposing Human Label Variation in NLI through the Lens of Explanations
Hong, Pingjun and Chen, Beiduo and Peng, Siyao and de Marneffe, Marie-Catherine and Roth, Benjamin and Plank, Barbara
2025
Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation
Ma, Bolei and Cao, Yong and Sen, Indira and Haensch, Anna-Carolina and Kreuter, Frauke and Plank, Barbara and Hershcovich, Daniel
2025
If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models
Orth, Jasmin and Mondorf, Philipp and Plank, Barbara
2025
Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort
Wang, Xinpeng and Joshi, Nitish and Plank, Barbara and Angell, Rico and He, He
2025
ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior
Eichin, Florian and Du, Yupei and Mondorf, Philipp and Matveev, Maria and Plank, Barbara and Hedderich, Michael A.
2025
Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically
Shim, Ryan Soh-Eun and Cristofaro, Domenico De and Hu, Chengzhi Martin and Vietti, Alessandro and Plank, Barbara
2025
Compositional-ARC: Assessing Systematic Generalization in Abstract Spatial Reasoning
Mondorf, Philipp and Zhou, Shijia and Riedler, Monica and Plank, Barbara
2025
Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior
Si, Shengyun and Wang, Xinpeng and Zhai, Guangyao and Navab, Nassir and Plank, Barbara
2024
Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination
Chen, Qiqi and Wang, Xinpeng and Mondorf, Philipp and Hedderich, Michael A. and Plank, Barbara
2023
Uncertainty in Natural Language Generation: From Theory to Applications
Baan, Joris and Daheim, Nico and Ilia, Evgenia and Ulmer, Dennis and Li, Haau-Sing and Fernández, Raquel and Plank, Barbara and Sennrich, Rico and Zerva, Chrysoula and Aziz, Wilker