Advancing Germanic language models with high-quality alignment data and experiments
Meet Annika Simonsen, a PhD student from the University of Iceland. In a recent interview, Annika shared insights into her role in the TrustLLM project, where she is working on alignment of Germanic language models.

Annika Simonsen, PhD student at the University of Iceland.
What are you currently working on in the TrustLLM project?
My research focuses on building high-quality alignment training and evaluation data for seven Germanic languages, while conducting alignment experiments using various combinations of open-source Germanic data. Currently, I’m investigating how synthetic data impacts alignment for Germanic languages – particularly whether it can be effective or if human-written data remains essential. This question is especially critical for low-resource languages with limited alignment data available. An additional challenge for critically low-resource languages is that synthetic data generated by open-source models often contains grammatical errors. This work ultimately strives to ensure that speakers of all Germanic languages have access to high-quality language models that understand their linguistic and cultural contexts.
Which challenges do you find most interesting to solve through the project?
As a linguist, I find the data challenges most fascinating. At TrustLLM, we’re committed to using and creating open-source resources, but there’s a significant gap in open-source alignment data for non-English Germanic languages – particularly for lower-resource languages like Swedish, Danish, Norwegian, and Icelandic, and critically low-resource ones like Faroese. While human-written data better reflects linguistic and cultural nuances of each language, collecting it is incredibly time-consuming. One interesting challenge we’re tackling is designing data collection methods that are engaging for participants, so contributing feels enjoyable rather than burdensome.
What new insights have you gained from being part of this research project?
TrustLLM brings together people with diverse expertise, which has taught me about various aspects of building LLMs – from training data collection to model architecture and evaluation techniques. Building an LLM requires many specialized skills, and we’re fortunate to have a great team at TrustLLM who actively share their knowledge and processes. Everyone plays a vital role, and together we function like a well-oiled machine.
What has been the most inspiring experience during your work on the project so far?
Working with people who genuinely care about democratizing AI has been truly inspiring. I particularly enjoy our discussion on our values and the importance of building LLMs ethically. If I had to highlight one specific inspiring experience, it would be co-organizing the Alignment and Evaluation (NB-REAL 2025) workshop at The Joint 25th Nordic Conference on Computational Linguistics and 11th Baltic Conference on Human Language Technologies in Tallinn this past March. I was honored to present TrustLLM’s alignment work and found it inspiring to connect with others doing similar work with Nordic and Baltic languages. The workshop was a success, and we’re already planning next year’s event.
What are you most looking forward to achieving in the coming months?
I’m excited about my upcoming Erasmus+ Traineeship at the Alexandra Institute in Copenhagen from May through July. This opportunity will allow me to work closely with our Evaluation work package, specifically evaluating our alignment models and integrating them into the EuroEval (formerly ScandEval) leaderboard framework developed at the Alexandra Institute. Since TrustLLM involves 11 different institutional partners across six countries, we primarily collaborate through online meetings. This research visit will be a fantastic opportunity to work with colleagues in person. I am also looking forward to welcoming the consortium to Iceland in June for our annual Consortium Meeting.
How do you think the TrustLLM project could impact the AI field?
Most significantly, we’re releasing nearly all our research and resources as open source, which will benefit everyone who speaks or works with Germanic languages. This democratizes access to high-quality language models for these languages and helps reduce the current English-centric bias in AI development. By focusing on trustworthiness and transparency in our approach, we’re also demonstrating how responsible AI development can be done collaboratively across multiple institutions and countries.