Second Annual Consortium Meeting of TrustLLM

AI-generated graphic that shows an image of a mountain on Iceland and a network

Join us for an inspiring and engaging event where experts, researchers and enthusiasts come together to explore advancements in large language models (LLMs) for Germanic languages. Over the course of three days, you’ll have the opportunity to connect, collaborate and contribute to shaping the future of LLMs.

Please note that this event is exclusively open to members of the TrustLLM consortium and their invited guests. If you are not part of the consortium but are interested in participating, please get in touch with us to discuss your involvement.

Stay tuned for the announcement of our keynote speakers, and get ready for a mix of insightful presentations, interactive workshops, and unique Icelandic experiences. We look forward to your active participation!

Programme

Please note that the programme is subject to change. We will update the schedule as needed.

Day 0: Tuesday, 10 June 2025

Day 1: Wednesday, 11 June 2025 – The bigger picture: Inspiration and collaboration

Morning session (open to the public): An event with focus on the bigger picture of LLMs in the EU. Will we make it or break it?

Venue: Veröld Auditorium

08:30: Coffee and registration

09:00 – 09:30: Welcome and introduction

09:30 – 11:00: Keynote presentations

  • 09:30 – 10:15: “The Icelandic approach – preserving and revitalizing linguistic and cultural diversity in AI”
    Speaker: Óttar Kolbeinsson Proppé, Project Manager at Almannarómur (Icelandic Centre for Language Technology)
  • 10:15 – 10:30: Coffee break
  • 10:30 – 11:15: The Alliance for Language Technologies (ALT-EDIC) – Missions and potential contributions to trust in LLMs
    Speaker: Edouard Geoffrois, director of ALT-EDIC (The Alliance for Language Technologies)

11:15 – 12:00: Panel discussion: “Challenges and opportunities in underrepresented language AI”
Featuring keynote speakers, project leaders, such as:

Þorvaldur Páll Helgason, CTO at Miðeind (a leading software company in the field of language technology and artificial intelligence for Icelandic)
Steinþór Steingrímsson, research assistant professor at The Árni Magnússon Institute for Icelandic Studies
Morris Riedel, Head of IHPC (National Competence Center for HPC & AI in Iceland).

Afternoon: Icelandic adventure

12:00 – 13:00: Participants organise their own lunch.
Nearby options: Háma (university canteen), Studentakjallarin (university pub), Plantan Bistro (vegan café at the Nordic House), and Eiriksdóttir (restaurant on campus)

13:00 – 22:00: Day trip to Reykjadalur
The bus leaves at 13:30 from Hallgrímskirkja, hike and social dinner at The Greenhouse food hall in Hveragerði (participants pay for themselves). The bus will start to drive back to Reykjavík around 21:15. To ensure you have a comfortable and enjoyable experience on our day trip, please make sure to bring the appropriate clothing and essentials. For detailed information on what to bring, please visit our Practical Information page.

Day 2: Thursday, 12 June 2025 – Deep dive into research and development

Venue: Háskólatorg, H101

Morning session

09:00 – 10:30: Presentations by PhD students and researchers

  • 09.00-09.20 “Aligning Germanic LLMs”
    Annika Simonsen (University of Iceland)
    In this talk, I will discuss the alignment of Germanic LLMs, focusing on two key aspects: the importance of data and the importance of evaluation. I will argue that we cannot do alignment properly or correctly without robust evaluation to confirm that alignment is working. While we have some data and have conducted initial evaluation, having a stronger base model will help significantly. I will explore how alignment is fundamentally about bringing out capabilities that the model already has, and how we can bake more knowledge into it through increased fine-tuning data and stronger pre-training.
  • 09.20-09.40 “Is Synthetic Alignment Data for the Germanic Languages Viable?”
    Mathias Stenlund (University of Iceland)
    This talk highlights the current state of WP5’s efforts to examine the viability of fully synthetic instruction-fine-tuning dataset generation for the Germanic languages of TrustLLM. Piggybacking off of recent innovations in synthetic data generation, we reimplement a lightweight instruction-fine-tuning dataset generation pipeline leveraging off already instruction-fine-tuned LLMs with permissive licenses. Initial assessments of generated datasets in Swedish suggest potential for this approach with room for improvement when it comes to overall quality and task diversity. We further discuss planned quality enhancement strategies and task diversification approaches.
  • 09.40-10.00 “Transforming Pretrained LLMs into AI Assistants”
    Hoda Fakharzadehjahromy (Linköping University)
    Pretrained Large Language Models (LLMs) contain extensive knowledge, yet they often fail at tasks requiring structured reasoning, such as mathematics. This is because pretraining alone does not align model outputs with task goals or human intent. Even when the information exists in the model, an unaligned LLM may not produce the correct answer. This work focuses on enhancing the mathematical ability of the TrustLLM model using Group Relative Policy Optimization (GRPO)—a reinforcement learning method that improves alignment without a value network. GRPO uses standardized, group-based reward signals to guide the model toward more reliable reasoning. Results on GSM8K show that GRPO significantly improves performance, offering a practical step toward trustworthy AI assistants.
  • 10.00-10.20 “Data Collection and Evaluation for Faroese”
    Iben Nyholm Debess (University of the Faroe Islands)
    In this presentation, I will introduce ongoing work for Faroese at the Centre for Language Technology, University of the Faroe Islands. With a focus on projects that are especially relevant for TrustLLM, I will present our text collection project, a PhD project on evaluation and benchmarking, and our efforts to align and fine-tune small models for Faroese, as well as support inclusion of Faroese in larger models and solutions. We strive to build and expand corpora and datasets ensuring coverage and representativeness for Faroese.

10:30 – 11:00: Coffee break

11:00 – 12:00: Panel discussions by work package leaders focusing on the second half of the project

Afternoon session

12:00 – 13:00: Lunch outside conference room, buffet-style

13:00 – 14:30: General assembly
Setting priorities for the next phase of the project.
Lead by: Tohid Ardeshiri and Fredrik Heintz

14:30 – 15:00: Coffee break

15:00 – 16:00: Interactive workshop (WP8): “Bridging the gap between academia and industry”
Collaborative session on practical applications of TrustLLM research

16:00 – 17:30 Guided walking tour of Reykjavik and stop at a social event

19:00 – 21:00: Conference dinner at Kopar

Day 3: Friday, 13 June 2025 – Charting the path forward

Venue: Háskólatorg, H101

Morning session

09:00 – 10:30: Presentations by PhD students and researchers

  • 09.00-09.20 “Language Adapters”
    Romina Oji (Linköping University)
    Adapting multilingual language models effectively requires striking a balance between efficiency, language coverage, and task demands. The performance of models like mDeBERTa varies with language resource levels, with parameter-efficient fine-tuning methods, such as LoRA and Pfeiffer, showing strong results for high-resource languages but inconsistent outcomes for lower-resource ones. The choice of tuning method also depends on the nature of the task. Adding adapters trained on unstructured text does not lead to clear improvements, emphasizing the importance of targeted and context-aware adaptation strategies.
  • 09.20-09.40 “Diffusion-Based Approaches to Pixel Language Modeling”
    Ingo Ziegler (University of Copenhagen)
    Current language models face limitations from autoregressive generation and vocabulary constraints, particularly in multilingual settings. While discrete diffusion models offer non-autoregressive alternatives, they still struggle with predefined vocabularies that become challenging to model effectively. Pixel language modeling presents a promising solution by learning text representations from visual renderings of text, eliminating tokenization and vocabulary dependencies. This talk introduces an approach that combines diffusion models with pixel-based text representations, treating text generation as image synthesis and bypassing both sequential dependencies and vocabulary bottlenecks inherent in existing paradigms.
  • 09.40-10.00 “Exploring the Attention Mechanism Design Space”
    Marcus Lång (Linköping University)
    Transformers with scaled dot product attention generally perform well in sequence modelling. We consider minimalistic examples where this is not the case to form probe problems to find attention mechanisms that can deal with problems where scaled dot product attention fails.
  • 10.00-10.20 “Foundations of tokenization and multilingual fairness”
    Garðar Ingvarsson and Haukur Barri Símonarson (Miðeind)
    Standard tokenization methods like BPE create systematic unfairness across languages, but even measuring this disparity is surprisingly tricky. “Equal amounts” of text in different languages aren’t actually equal, making comparison using bits per token or bits per byte meaningless. We propose better ways to make fair comparisons and show that even with proper measurement, current tokenization fundamentally disadvantages certain language types, especially morphologically rich ones. We examine where tokenization unfairness comes from and what tradeoffs exist in fixing it.

10:30 – 11:00: Coffee break

11:00 – 12:00: Focused discussion groups

  • Data management and privacy in LLM development
    Danila Petrelli, Senior Data Manager at AI Sweden (National center for applied artificial intelligence)
  • Scaling TrustLLM: From research to real-world applications
    Saskia Lensink, consultant and business manager at TNO (an independent research organisation in the Netherlands)
  • Ethical AI: Ensuring fairness and inclusivity in Germanic language models
    Hafsteinn Einarsson, Associate Professor at University of Iceland

Afternoon session

12:00 – 13:00: Lunch at Studentakjallarinn

13:00 – 14:00: Plenary Session: “The road ahead for TrustLLM”
Lead by Fredrik Heintz

  • Roadmap for the second half of the TrustLLM project
  • Collaboration opportunities with other projects
  • Recap of key takeaways
  • Closing remarks and action items
  • Assigning responsibilities and next steps

Optional afternoon programme on Friday 13 June

14:00 – 15:30: Optional workshop sessions and meetings in smaller groups

  • Hands-on tutorials on TrustLLM tools and resources
  • One-on-one meetings with project leaders and collaborators
  • WP9 Communication, Dissemination and Stakeholder Management
    Lead by Mariel Svensson (Room: Charles – Gróska, Tölvunarfræði)

15:30-17:00: PhD Networking with pizza and pub quiz (“So You Think You Know AI?”)
Lead by Annika Simonsen

19:00 onwards: Farewell dinner (optional, self-paid at food hall)

Activities in Iceland

Make the most of your free time during the conference. From must-see attractions to delicious food and drink experiences, there’s plenty to explore. Check out our curated list of activities to enhance your stay.

Explore Iceland’s must-see attractions and culinary delights

Practical information