TL;DR
Portugal announced a €5.5 million investment in AMÁLIA, a large open-source language model focused on European Portuguese. The project aims to improve AI understanding of Portuguese, but key details about data and model availability remain unclear.
Portugal’s government announced a €5.5 million investment in AMÁLIA, a large-scale open-source language model designed specifically for European Portuguese, marking a significant step in language-specific AI development for the country.
AMÁLIA is a collaborative project involving top Portuguese research institutions, including NOVA, IST, IT, and FCT. It builds upon the pre-training of EuroLLM, with modifications to improve focus on European Portuguese data. The model is trained using datasets that include Arquivo.pt, which contributes approximately 5.8 billion tokens, representing around 5.5% of the total training tokens. During supervised fine-tuning and preference training, the team used synthetic Portuguese data, increasing the language’s representation in the model’s training process.
Although the project is described as fully open source, current available resources do not include model weights, datasets, or training logs, which limits external validation and use. The team has developed four benchmarks specific to European Portuguese, including ALBA, to evaluate the model’s performance. The model has shown promising results, outperforming some state-of-the-art models like Qwen 3-8B on most benchmarks, but still lags behind on certain tasks like ALBA, raising questions about the impact of Portuguese-specific training data.
Why It Matters
This development matters because it represents a targeted effort to enhance AI capabilities in European Portuguese, a language with limited large-scale language models compared to English or Chinese. The investment and research could foster more localized AI applications, improve natural language understanding for Portuguese speakers, and set a precedent for other small languages. However, limited data and the current lack of open model weights raise questions about the model’s immediate accessibility and utility for the broader community.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips
PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
Portugal has historically lagged behind larger countries in AI development, partly due to limited language-specific data. Recent initiatives like EuroLLM and now AMÁLIA aim to address this gap. The project follows a broader trend of developing language-specific models, similar to Italy’s Minerva, but faces unique challenges due to the smaller volume of Portuguese data available for training large models. The emphasis on open-source principles aligns with global efforts to democratize AI, but actual resource sharing remains limited at this stage.
“AMÁLIA aims to treat European Portuguese as a first-class citizen in AI language models.”
— Research team member
“Despite the investment, the lack of open model weights and datasets raises questions about the immediate utility of AMÁLIA.”
— Hacker News observer

Lonely Planet Portuguese Phrasebook & Dictionary
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It remains unclear when or if the model weights, datasets, and training logs will be publicly released. The current status suggests ongoing development, and the full impact of the Portuguese-specific data on performance is still under assessment. Additionally, how the model will be integrated into practical applications or further research is yet to be determined.

Mindset – A nova psicologia do sucesso (Em Portugues do Brasil)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
The next steps include the potential release of model weights and datasets, further benchmarking, and community engagement to evaluate real-world applications. Monitoring updates from the research team and government will be crucial to understand the project’s progress and accessibility.

My AI Agent: Ultimate Beginner’s Guide to Building an AI Agent with Python: Master AI automation step by step using free frameworks and tools
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Will the AMÁLIA model weights be publicly available?
It is not yet clear when or if the model weights will be released, as current resources do not include them. The project appears to be in development, and future updates are expected.
How does AMÁLIA compare to other Portuguese language models?
AMÁLIA outperforms some models like Qwen 3-8B on most benchmarks but still lags behind on certain tasks like ALBA, likely due to the amount and quality of Portuguese data used during training.
What data was used to train AMÁLIA?
The model was trained on approximately 107 billion tokens, with about 5.8 billion tokens from Arquivo.pt, representing roughly 5.5% of the total. Synthetic Portuguese data was also used during fine-tuning.
Why is open sourcing important for models like AMÁLIA?
Open sourcing allows researchers and developers worldwide to validate, improve, and adapt the model, fostering innovation and ensuring transparency. Currently, the lack of open weights limits these benefits.