TrustLLM celebrates a year of innovation

April 14, 10-11 (CET)

TrustLLM celebrates a year of innovation

Dec 17, 2024 | News

a large group of people standing together in front of the camera

As we wrap up the first year of the TrustLLM project, it’s incredible to see how far we’ve come. Our journey has been marked by significant milestones, from securing substantial HPC resources to advancing our data curation efforts.

In an interview, Professor Fredrik Heintz shared his thoughts on what makes the TrustLLM project unique:

“We are not only a traditional research project focused on conducting new and fascinating research, but we also aim to train, evaluate, and test models in both benchmarks and real-world applications. The real value of our project lies in how we connect and integrate these different parts.”

One of our key achievements this year was securing 77,500 node hours on the MareNostrum cluster, which allowed us to train our baseline model. This was a significant milestone, despite the challenges we faced with resource management and training efficiency.

Data curation has been another critical focus. Leveraging data from various projects and developing an initial data management plan has taught us valuable lessons about the complexities of acquiring and curating high-quality, trustworthy data.

Looking ahead, our goal remains to curate increasing amounts of high-quality data, secure more HPC capacity, and integrate our research into the models we train. We are committed to documenting our lessons learned and sharing our insights with the broader community.

As we celebrate this milestone, we want to thank everyone involved for their hard work and dedication. Here’s to another year of innovation and progress!

How can we make large language models more factually reliable? Can better data, external tools, and structured knowledge help reduce hallucinations? This TrustLLM webinar will focus on improving the factual trustworthiness of LLMs.

As large language models are increasingly used in real‑world and high‑stakes settings, their tendency to produce fluent but incorrect information remains a major challenge. This webinar presents three main contributions of the ongoing work from TrustLLM Work Package 3, which tackles factual reliability through three methodological approaches: data curation, tool learning, and structured knowledge extraction. Together, these three perspectives show how better data, external tool integration, and structured knowledge representations can jointly strengthen the factual reliability and trustworthiness of large language models.

The first topic introduces JQL (Judging Quality across Languages). JQL is a scalable method for curating high‑quality multilingual datasets by distilling LLM‑based annotations into lightweight models built on cross‑lingual embeddings. This approach demonstrates how systematic data curation across languages can directly improve the factual grounding of LLMs.
The second contribution explores how structured tool use can help anchor model outputs in real‑world information. Tool learning enables LLMs to interact with external systems—such as retrievers or specialized tools—allowing them to verify facts and reason over up‑to‑date sources rather than relying solely on internal representations.
Finally, we explore knowledge graph construction and ontology learning as a way to enhance factual consistency. By comparing single‑step and multi‑step reasoning strategies, this work investigates how LLMs can more reliably extract structured knowledge from text, supporting downstream reasoning and verification tasks.

Please note that this webinar will be recorded!

More from TrustLLM