AMÁLIA and the future of European Portuguese LLMs

TL;DR

Portugal announced a €5.5 million investment in AMÁLIA, a large open-source language model focused on European Portuguese. The project aims to improve AI understanding of Portuguese, but key details about data and model availability remain unclear.

Portugal’s government announced a €5.5 million investment in AMÁLIA, a large-scale open-source language model designed specifically for European Portuguese, marking a significant step in language-specific AI development for the country.

AMÁLIA is a collaborative project involving top Portuguese research institutions, including NOVA, IST, IT, and FCT. It builds upon the pre-training of EuroLLM, with modifications to improve focus on European Portuguese data. The model is trained using datasets that include Arquivo.pt, which contributes approximately 5.8 billion tokens, representing around 5.5% of the total training tokens. During supervised fine-tuning and preference training, the team used synthetic Portuguese data, increasing the language’s representation in the model’s training process.

Although the project is described as fully open source, current available resources do not include model weights, datasets, or training logs, which limits external validation and use. The team has developed four benchmarks specific to European Portuguese, including ALBA, to evaluate the model’s performance. The model has shown promising results, outperforming some state-of-the-art models like Qwen 3-8B on most benchmarks, but still lags behind on certain tasks like ALBA, raising questions about the impact of Portuguese-specific training data.

Why It Matters

This development matters because it represents a targeted effort to enhance AI capabilities in European Portuguese, a language with limited large-scale language models compared to English or Chinese. The investment and research could foster more localized AI applications, improve natural language understanding for Portuguese speakers, and set a precedent for other small languages. However, limited data and the current lack of open model weights raise questions about the model’s immediate accessibility and utility for the broader community.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers

As an affiliate, we earn on qualifying purchases.

Background

Portugal has historically lagged behind larger countries in AI development, partly due to limited language-specific data. Recent initiatives like EuroLLM and now AMÁLIA aim to address this gap. The project follows a broader trend of developing language-specific models, similar to Italy’s Minerva, but faces unique challenges due to the smaller volume of Portuguese data available for training large models. The emphasis on open-source principles aligns with global efforts to democratize AI, but actual resource sharing remains limited at this stage.

“AMÁLIA aims to treat European Portuguese as a first-class citizen in AI language models.”

— Research team member

“Despite the investment, the lack of open model weights and datasets raises questions about the immediate utility of AMÁLIA.”

— Hacker News observer

Lonely Planet Portuguese Phrasebook & Dictionary

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear when or if the model weights, datasets, and training logs will be publicly released. The current status suggests ongoing development, and the full impact of the Portuguese-specific data on performance is still under assessment. Additionally, how the model will be integrated into practical applications or further research is yet to be determined.

Mindset – A nova psicologia do sucesso (Em Portugues do Brasil)

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps include the potential release of model weights and datasets, further benchmarking, and community engagement to evaluate real-world applications. Monitoring updates from the research team and government will be crucial to understand the project’s progress and accessibility.

My AI Agent: Ultimate Beginner’s Guide to Building an AI Agent with Python: Master AI automation step by step using free frameworks and tools

As an affiliate, we earn on qualifying purchases.

Key Questions

Will the AMÁLIA model weights be publicly available?

It is not yet clear when or if the model weights will be released, as current resources do not include them. The project appears to be in development, and future updates are expected.

How does AMÁLIA compare to other Portuguese language models?

AMÁLIA outperforms some models like Qwen 3-8B on most benchmarks but still lags behind on certain tasks like ALBA, likely due to the amount and quality of Portuguese data used during training.

What data was used to train AMÁLIA?

The model was trained on approximately 107 billion tokens, with about 5.8 billion tokens from Arquivo.pt, representing roughly 5.5% of the total. Synthetic Portuguese data was also used during fine-tuning.

Why is open sourcing important for models like AMÁLIA?

Open sourcing allows researchers and developers worldwide to validate, improve, and adapt the model, fostering innovation and ensuring transparency. Currently, the lack of open weights limits these benefits.

AMÁLIA and the future of European Portuguese LLMs

Up next

Interfaze: A new model architecture built for high accuracy at scale

Author

Deep Intellica Team

Share article

Why It Matters

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

Background

Lonely Planet Portuguese Phrasebook & Dictionary

What Remains Unclear

Mindset – A nova psicologia do sucesso (Em Portugues do Brasil)

What’s Next

My AI Agent: Ultimate Beginner’s Guide to Building an AI Agent with Python: Master AI automation step by step using free frameworks and tools

Key Questions

Will the AMÁLIA model weights be publicly available?

How does AMÁLIA compare to other Portuguese language models?

What data was used to train AMÁLIA?

Why is open sourcing important for models like AMÁLIA?

Scientists Unveil a Light-Based Processor That Makes AI Vastly Faster and Greener.

The A.I. Soirée in San Francisco Proves Machine Intelligence Pairs Well With Fine Wine.

When Your Shopping Assistant Starts Weighing Right and Wrong

Schools Hold the Key to Converting Ai’s Challenges Into Economic Growth.

Interfaze: A new model architecture built for high accuracy at scale

South Korea floats using AI profits for ‘people’s dividend’

Here’s what Mira Murati’s AI company is up to

Resin 3D Printing Looks Incredible—But The Cleanup Reality Is Brutal

AMÁLIA and the future of European Portuguese LLMs

Up next

Author

Deep Intellica Team

Share article

Why It Matters

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

Background

Lonely Planet Portuguese Phrasebook & Dictionary

What Remains Unclear

Mindset – A nova psicologia do sucesso (Em Portugues do Brasil)

What’s Next

My AI Agent: Ultimate Beginner’s Guide to Building an AI Agent with Python: Master AI automation step by step using free frameworks and tools

Key Questions

Will the AMÁLIA model weights be publicly available?

How does AMÁLIA compare to other Portuguese language models?

What data was used to train AMÁLIA?

Why is open sourcing important for models like AMÁLIA?

You May Also Like