TL;DR

Portugal announced a €5.5 million investment in AMÁLIA, a large open-source language model focused on European Portuguese. The project aims to improve AI understanding of Portuguese, but key details about data and model availability remain unclear.

Portugal’s government announced a €5.5 million investment in AMÁLIA, a large-scale open-source language model designed specifically for European Portuguese, marking a significant step in language-specific AI development for the country.

AMÁLIA is a collaborative project involving top Portuguese research institutions, including NOVA, IST, IT, and FCT. It builds upon the pre-training of EuroLLM, with modifications to improve focus on European Portuguese data. The model is trained using datasets that include Arquivo.pt, which contributes approximately 5.8 billion tokens, representing around 5.5% of the total training tokens. During supervised fine-tuning and preference training, the team used synthetic Portuguese data, increasing the language’s representation in the model’s training process.

Although the project is described as fully open source, current available resources do not include model weights, datasets, or training logs, which limits external validation and use. The team has developed four benchmarks specific to European Portuguese, including ALBA, to evaluate the model’s performance. The model has shown promising results, outperforming some state-of-the-art models like Qwen 3-8B on most benchmarks, but still lags behind on certain tasks like ALBA, raising questions about the impact of Portuguese-specific training data.

Why It Matters

This development matters because it represents a targeted effort to enhance AI capabilities in European Portuguese, a language with limited large-scale language models compared to English or Chinese. The investment and research could foster more localized AI applications, improve natural language understanding for Portuguese speakers, and set a precedent for other small languages. However, limited data and the current lack of open model weights raise questions about the model’s immediate accessibility and utility for the broader community.

Portuguese Flash Cards - Learn Portuguese Language Vocabulary Words and Phrases - Basic Language for Beginners - Gift for Travelers, Kids, and Adults by Travelflips

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Portugal has historically lagged behind larger countries in AI development, partly due to limited language-specific data. Recent initiatives like EuroLLM and now AMÁLIA aim to address this gap. The project follows a broader trend of developing language-specific models, similar to Italy’s Minerva, but faces unique challenges due to the smaller volume of Portuguese data available for training large models. The emphasis on open-source principles aligns with global efforts to democratize AI, but actual resource sharing remains limited at this stage.

“AMÁLIA aims to treat European Portuguese as a first-class citizen in AI language models.”

— Research team member

“Despite the investment, the lack of open model weights and datasets raises questions about the immediate utility of AMÁLIA.”

— Hacker News observer

Lonely Planet Portuguese Phrasebook & Dictionary

Lonely Planet Portuguese Phrasebook & Dictionary

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear when or if the model weights, datasets, and training logs will be publicly released. The current status suggests ongoing development, and the full impact of the Portuguese-specific data on performance is still under assessment. Additionally, how the model will be integrated into practical applications or further research is yet to be determined.

Mindset - A nova psicologia do sucesso (Em Portugues do Brasil)

Mindset – A nova psicologia do sucesso (Em Portugues do Brasil)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps include the potential release of model weights and datasets, further benchmarking, and community engagement to evaluate real-world applications. Monitoring updates from the research team and government will be crucial to understand the project’s progress and accessibility.

My AI Agent: Ultimate Beginner’s Guide to Building an AI Agent with Python: Master AI automation step by step using free frameworks and tools

My AI Agent: Ultimate Beginner’s Guide to Building an AI Agent with Python: Master AI automation step by step using free frameworks and tools

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Will the AMÁLIA model weights be publicly available?

It is not yet clear when or if the model weights will be released, as current resources do not include them. The project appears to be in development, and future updates are expected.

How does AMÁLIA compare to other Portuguese language models?

AMÁLIA outperforms some models like Qwen 3-8B on most benchmarks but still lags behind on certain tasks like ALBA, likely due to the amount and quality of Portuguese data used during training.

What data was used to train AMÁLIA?

The model was trained on approximately 107 billion tokens, with about 5.8 billion tokens from Arquivo.pt, representing roughly 5.5% of the total. Synthetic Portuguese data was also used during fine-tuning.

Why is open sourcing important for models like AMÁLIA?

Open sourcing allows researchers and developers worldwide to validate, improve, and adapt the model, fostering innovation and ensuring transparency. Currently, the lack of open weights limits these benefits.

You May Also Like

Scientists Unveil a Light-Based Processor That Makes AI Vastly Faster and Greener.

An innovative light-based processor promises to revolutionize AI speed and energy efficiency, but how does this groundbreaking technology work?

The A.I. Soirée in San Francisco Proves Machine Intelligence Pairs Well With Fine Wine.

Lured by the perfect blend of AI innovation and elegance, discover why this San Francisco soirée leaves everyone eager to learn more.

When Your Shopping Assistant Starts Weighing Right and Wrong

Stumbling upon a shopping assistant that considers ethics and personalization raises questions about how AI determines what’s right or wrong, and why it matters.

Schools Hold the Key to Converting Ai’s Challenges Into Economic Growth.

Schools hold the key to transforming AI challenges into economic growth by fostering innovation—discover how they can lead the way.