TL;DR

Portugal announced a €5.5 million investment in AMÁLIA, a large open-source language model focused on European Portuguese. The project aims to improve AI understanding of Portuguese, but key details about data and model availability remain unclear.

Portugal’s government announced a €5.5 million investment in AMÁLIA, a large-scale open-source language model designed specifically for European Portuguese, marking a significant step in language-specific AI development for the country.

AMÁLIA is a collaborative project involving top Portuguese research institutions, including NOVA, IST, IT, and FCT. It builds upon the pre-training of EuroLLM, with modifications to improve focus on European Portuguese data. The model is trained using datasets that include Arquivo.pt, which contributes approximately 5.8 billion tokens, representing around 5.5% of the total training tokens. During supervised fine-tuning and preference training, the team used synthetic Portuguese data, increasing the language’s representation in the model’s training process.

Although the project is described as fully open source, current available resources do not include model weights, datasets, or training logs, which limits external validation and use. The team has developed four benchmarks specific to European Portuguese, including ALBA, to evaluate the model’s performance. The model has shown promising results, outperforming some state-of-the-art models like Qwen 3-8B on most benchmarks, but still lags behind on certain tasks like ALBA, raising questions about the impact of Portuguese-specific training data.

Why It Matters

This development matters because it represents a targeted effort to enhance AI capabilities in European Portuguese, a language with limited large-scale language models compared to English or Chinese. The investment and research could foster more localized AI applications, improve natural language understanding for Portuguese speakers, and set a precedent for other small languages. However, limited data and the current lack of open model weights raise questions about the model’s immediate accessibility and utility for the broader community.

Portuguese for Beginners: Practical Learning with SynapseLingo (Learn Portuguese)

Portuguese for Beginners: Practical Learning with SynapseLingo (Learn Portuguese)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Portugal has historically lagged behind larger countries in AI development, partly due to limited language-specific data. Recent initiatives like EuroLLM and now AMÁLIA aim to address this gap. The project follows a broader trend of developing language-specific models, similar to Italy’s Minerva, but faces unique challenges due to the smaller volume of Portuguese data available for training large models. The emphasis on open-source principles aligns with global efforts to democratize AI, but actual resource sharing remains limited at this stage.

“AMÁLIA aims to treat European Portuguese as a first-class citizen in AI language models.”

— Research team member

“Despite the investment, the lack of open model weights and datasets raises questions about the immediate utility of AMÁLIA.”

— Hacker News observer

Learn European Portuguese: Tips & Tricks to Make Portuguese Easy (Bilingual Portuguese–English Edition) — A1/A2 Rules, Dialogue, Vocabulary & Practice

Learn European Portuguese: Tips & Tricks to Make Portuguese Easy (Bilingual Portuguese–English Edition) — A1/A2 Rules, Dialogue, Vocabulary & Practice

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear when or if the model weights, datasets, and training logs will be publicly released. The current status suggests ongoing development, and the full impact of the Portuguese-specific data on performance is still under assessment. Additionally, how the model will be integrated into practical applications or further research is yet to be determined.

Mindset - A nova psicologia do sucesso (Em Portugues do Brasil)

Mindset – A nova psicologia do sucesso (Em Portugues do Brasil)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps include the potential release of model weights and datasets, further benchmarking, and community engagement to evaluate real-world applications. Monitoring updates from the research team and government will be crucial to understand the project’s progress and accessibility.

Jetson AGX Orin 64GB Developer Kit 275 Tops, with 1TB SSD,8MP USB Camera, AI Embedded Development Provides AI Large Models

Jetson AGX Orin 64GB Developer Kit 275 Tops, with 1TB SSD,8MP USB Camera, AI Embedded Development Provides AI Large Models

AGX Orin 64GB Development Kit makes it easy to get started with AGX Orin. Its compact size, rich…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Will the AMÁLIA model weights be publicly available?

It is not yet clear when or if the model weights will be released, as current resources do not include them. The project appears to be in development, and future updates are expected.

How does AMÁLIA compare to other Portuguese language models?

AMÁLIA outperforms some models like Qwen 3-8B on most benchmarks but still lags behind on certain tasks like ALBA, likely due to the amount and quality of Portuguese data used during training.

What data was used to train AMÁLIA?

The model was trained on approximately 107 billion tokens, with about 5.8 billion tokens from Arquivo.pt, representing roughly 5.5% of the total. Synthetic Portuguese data was also used during fine-tuning.

Why is open sourcing important for models like AMÁLIA?

Open sourcing allows researchers and developers worldwide to validate, improve, and adapt the model, fostering innovation and ensuring transparency. Currently, the lack of open weights limits these benefits.

You May Also Like

What happens when AI starts building itself?

A new startup, Recursive Superintelligence, reveals its goal to develop AI that can autonomously improve itself, raising questions about future AI capabilities and risks.

From Coding to Copywriting: Are LLMs Automating Creative Work?

The transformation of creative work through LLMs is underway, but will human ingenuity still be essential as automation advances?

China Sphere Capability Gap, Q2 2026 Update: Five Labs, Five Strategies, One Narrowing Frontier

Five Chinese labs launched frontier-tier models within four weeks, narrowing the capability gap with the US but maintaining cost and independence advantages.

The Continual Learning Research Map: Where the Memento Constraint Stands in May 2026

A comprehensive update on the research landscape of the Memento Constraint, highlighting current approaches, timelines, and remaining challenges in achieving genuine continual learning in AI.