Publications

2025

Þórunn Arnardóttir, Elías Bjartur Einarsson, Garðar Ingvarsson Juto, Þorvaldur Páll Helgason, Hafsteinn Einarsson. 2025. WikiQA-IS: Assisted Benchmark Generation and Automated Evaluation of Icelandic Cultural Knowledge in LLMs. Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025).

Barbara Scalvini, Iben Nyholm Debess, Annika Simonsen, Hafsteinn Einarsson. 2025. Rethinking Low-Resource MT: The Surprising Effectiveness of Fine-Tuned Multilingual Models in the LLM Age. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).

Barbara Scalvini, Annika Simonsen, Iben Nyholm Debess, Hafsteinn Einarsson. 2025. Prompt Engineering Enhances Faroese MT, but Only Humans Can Tell. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).

Steinunn Rut Friðriksdóttir, Dan Saattrup Nielsen, Hafsteinn Einarsson. 2025. Hotter and Colder: A New Approach to Annotating Sentiment, Emotions, and Bias in Icelandic Blog Comments. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025)

Romina Oji, Jenny Kunz. 2025. How to Tune a Multilingual Encoder Model for Germanic Languages: A Study of PEFT, Full Fine-Tuning, and Language Adapters. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).

Jenny Kunz. 2025. Train More Parameters But Mind Their Placement: Insights into Language Adaptation with PEFT. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).

Dan Saattrup Nielsen, Kenneth Enevoldsen, Peter Schneider-Kamp. Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks. Proceedings of the Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies (NoDaLiDa/Baltic-HLT 2025).

Annika Simonsen, Dan Saattrup Nielsen, and Hafsteinn Einarsson. 2025. FoQA: A Faroese Question-Answering Dataset. In Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 48–57, Tallinn, Estonia. University of Tartu Library, Estonia.

2024

Steinunn Rut Friðriksdóttir, Annika Simonsen, Atli Snær Ásmundsson, Guðrún Lilja Friðjónsdóttir, Anton Karl Ingason, Vésteinn Snæbjarnarson, Hafsteinn Einarsson. 2024. Ice and Fire: Dataset on Sentiment, Emotions, Toxicity, Sarcasm, Hate speech, Sympathy and More in Icelandic Blog Comments. Proceedings of the Fourth Workshop on Threat, Aggression & Cyberbullying @ LREC-COLING-2024.

Annika Simonsen, Hafsteinn Einarsson. 2024. A Human Perspective on GPT-4 Translations: Analysing Faroese to English News and Blog Text Translations. Proceedings of the 25th Annual Conference of the European Association for Machine Translation (Volume 1).

Roos M. Bakker, Daan L. Di Scala. 2024. From Text to Knowledge Graph: Comparing Relation Extraction Methods in a Practical Context. in First International Workshop on Generative Neuro-Symbolic AI, co-located with ESWC 2024, Hersonissos, Crete, Greece, May 2024.

Roos M. Bakker, Daan L. Di Scala and Maaike H. T. de Boer. 2024. Ontology Learning from Text: an Analysis on LLM Performance. In Proceedings of the 3rd NLP4KGC International Workshop on Natural Language Processing for Knowledge Graph Creation, co-located with Semantics 2024, Amsterdam, Netherlands, Sep. 2024.

Jiangtao Wang, Jan Ebert, Oleg Filatov, Stefan Kesselheim. 2024. Memory and Bandwidth are All You Need for Fully Sharded Data Parallel. Accepted to the Workshop on Advancing Neural Network Training at International Conference on Machine Learning (WANT@ICML 2024).

Ehsan Doostmohammadi, Oskar Holmström, Marco Kuhlmann. 2024. How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 6321–6336, Miami, Florida, USA. Association for Computational Linguistics.

Alexander Arno Weber, Klaudia Thellmann, Jan Ebert, Nicolas Flores-Herr, Jens Lehmann, Michael Fromm, and Mehdi Ali. 2024. Investigating Multilingual Instruction-Tuning: Do Polyglot Models Demand for Multilingual Instructions?. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20829–20855, Miami, Florida, USA. Association for Computational Linguistics.

Oskar Holmström, Jenny Kunz. 2024. The impact of language adapters in cross-lingual transfer for NLU. In Proceedings of the 1st Workshop on Modular and Open Multilingual NLP (MOOMIN 2024), pages 24–43, St Julians, Malta. Association for Computational Linguistics.

Annika Simonsen, Hafsteinn Einarsson, Iben Nyholm Debess. 2024. Good or Bad News? Exploring GPT-4 for Sentiment Analysis for Faroese on a Public News Corpora. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 7814–7824, Torino, Italia. ELRA and ICCL.

Mehdi Ali, Michael Fromm, Klaudia Thellmann, Richard Rutmann, Max Lübbering, Johannes Leveling, Katrin Klug, Jan Ebert, Niclas Doll, Jasper Schulze Buschhoff, Charvi Jain, Alexander Arno Weber, Lena Jurkschat, Hammam Abdelwahab, Chelsea John, Pedro Ortiz Suarez, Malte Ostendorff, Samuel Weinbach, Rafet Sifa, Stefan Kesselheim, Nicolas Flores-Herr. 2024. Tokenizer Choice for LLM Training: Negligible or Crucial? In Findings of the Association for Computational Linguistics: NAACL 2024, pages 3907 – 3924, Mexico City, Mexico. Association for Computational Linguistics

Shangrui Nie, Michael Fromm, Charles Welch, Rebekka Görge, Akbar Karimi, Joan Plepi, Nazia Afsan Mowmita, Nicolas Flores-Herr, Mehdi Ali, Lucie Flek. 2024. Do Multilingual Large Language Models Mitigate Stereotype Bias? Proceedings of the 2nd Workshop on Cross-Cultural Considerations in NLP, pages 65-83, Bangkok Thailand, Association for Computational Linguistics