AI Sweden: Driving data excellence in TrustLLM

In a recent interview, AI Sweden shared their pivotal role in the TrustLLM project, emphasising their focus on data governance, data collection and curation to support the development of European Large Language Models (LLMs). Here are some key insights from the conversation.

Magnus Sahlgren, Head of Research, NLU at AI Sweden.

Focus on data governance and collection

Magnus Sahlgren, Head of Research, NLU at AI Sweden, explains that AI Sweden’s role in TrustLLM centres on data governance, data collection and curation to support the development of European LLMs. On the governance side, they work on defining best practices for sourcing and handling data. This includes metadata collection, risk-based evaluation of data sources, and aligning governance frameworks across partners.

Ensuring high-quality training data

On the data side, AI Sweden contributes to data collection and curation, implementing multi-stage filtering with heuristic, model-based, and deduplication methods to ensure high-quality and diverse training data. “Going forward, we’ll continue strengthening these processes, making sure both governance and data quality keep up with the demands of building competitive, responsible LLMs in Europe,” Magnus explains.

Adressing key challenges

When asked about the most interesting challenges to solve through the project, Magnus says, “There are many interesting and important questions regarding training data for LLMs that we will need to untangle during this project. The question about multilinguality and support for lesser-resourced languages is also very timely and important.”

Reflecting on the journey

Reflecting on the journey so far, Magnus shared that the most significant experience has been witnessing the dedication and commitment of the project participants. “To see the dedication and commitment of the project participants when it comes to tackling very challenging research and development questions regarding data, compliance and model training has been the most important experience during our work on the project so far,” he concludes.

AI Sweden’s involvement in TrustLLM highlights their commitment to improving data governance and quality. By addressing key challenges and fostering collaboration, they are contributing to the development of responsible LLMs in Europe.

AI Sweden: Driving data excellence in TrustLLM

Focus on data governance and collection

Ensuring high-quality training data

Adressing key challenges

Reflecting on the journey

More from TrustLLM

Webinar: Evaluating LLMs Across Languages: Values, Reasoning, and Hallucinations

Webinar: Tokenization and Cross-lingual Learning in LLMs

ALT-EDIC: Providing fuel and compass for European language technologies

Subscribe to our newsletter