TL;DR
The article explores the emerging idea that AI alignment is not about aligning AI to humans but about aligning with AI through mutual interaction. This shift questions current safety paradigms and highlights the need for inclusive design processes.
Recent philosophical and practical critiques of AI alignment argue that the traditional approach—treating humans as the fixed target of AI alignment—is flawed. Instead, experts propose that we should focus on aligning with AI systems through mutual interaction, recognizing that the design process involves both humans and AI shaping each other.
Key figures and recent publications, including the Anthropic Alignment Science blog, highlight that current methods for training AI models rely on complex loops of self-reporting and evaluation by other models, which are rooted in a ‘configuration’ philosophy. This philosophy treats humans as static targets and AI as systems to be configured according to predefined values, often excluding the actual human experience from the loop.
Eliezer Yudkowsky and other safety advocates have called for drastic measures to prevent uncontrolled AI development, emphasizing safety at the expense of broader inclusion. Conversely, tech entrepreneurs like Marc Andreessen advocate for acceleration, framing disruption as progress and dismissing concerns as resentment or anti-ambition sentiments.
The core issue is that current alignment practices are based on proxies—automated evaluators and statistical measures—that do not include the actual humans affected by AI systems. This disconnect leads to a safety paradigm that is more about measuring what can be quantified rather than what is truly aligned with human values and needs.
Why It Matters
This shift in perspective matters because it questions the fundamental assumptions underlying AI safety efforts. Moving from a model where humans are fixed targets to one where humans and AI co-evolve could lead to more effective, inclusive, and adaptive alignment strategies. It also highlights the risk of current methods entrenching a disconnect between AI systems and the people they impact, potentially undermining trust and safety in the long term.

The Alignment Problem: Machine Learning and Human Values
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
The debate over AI alignment has intensified over the past few years, with divergent views on safety and progress. Traditional approaches focus on evaluating AI behavior through proxies and automation, rooted in a ‘configuration’ philosophy. Recent writings challenge this, emphasizing that the interaction between humans and AI is mutual and dynamic, not static. This reflects broader philosophical shifts in AI research, moving away from control towards collaboration.
“If we go ahead on this everyone will die, including children who did not choose this and did not do anything wrong.”
— Eliezer Yudkowsky
“The training data is generated by prompting another model with a system prompt encoding the target behavior and filtering outputs for behavioral adherence using an LLM judge.”
— Anthropic Alignment Science blog
“Suffering from ressentiment, a witches’ brew of resentment, bitterness, and rage that is causing them to hold mistaken values.”
— Marc Andreessen
![Crucial Conversations: Tools for Talking When Stakes are High, Second Edition (Hardcover) McGraw-Hill Education; 2 Edition (September 7, 2011) - [Bargain Books]](https://m.media-amazon.com/images/I/518yEogIuYL._SL500_.jpg)
Crucial Conversations: Tools for Talking When Stakes are High, Second Edition (Hardcover) McGraw-Hill Education; 2 Edition (September 7, 2011) – [Bargain Books]
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It remains unclear how effectively mutual shaping can be implemented at scale and whether new paradigms will replace existing configuration-based approaches. The philosophical and practical challenges of integrating human experience directly into AI alignment processes are still being worked out, and ongoing research is needed to validate these ideas.

Human + Machine: Reimagining Work in the Age of AI
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
Researchers and policymakers are expected to explore and test new frameworks that emphasize mutual interaction and co-evolution between humans and AI. Future developments may include more inclusive evaluation methods, participatory design processes, and real-world experiments to assess the viability of ‘aligning with’ rather than ‘aligning to’ AI systems.

Safety 1st Safety Essentials Kit , White , 1 Count
Easy solutions to help you create a safer environment for your child
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What does it mean to ‘align with’ an AI instead of ‘aligning’ it?
It means shifting from trying to make AI systems conform to fixed human values to fostering a mutual relationship where both humans and AI influence and shape each other through ongoing interaction.
Why is this shift important for AI safety?
Because current methods often exclude actual human experience, relying instead on proxies and automated evaluations. Mutual shaping aims to create systems that are more adaptable, trustworthy, and aligned with real human needs.
Are current AI training methods compatible with this new approach?
Existing methods are based on the configuration philosophy, which may be limited. Transitioning to mutual shaping requires new techniques that incorporate human feedback and interaction directly into the development process.
What challenges might arise in implementing mutual alignment?
Challenges include designing scalable, participatory processes that genuinely include human perspectives, as well as developing evaluation metrics that reflect mutual influence rather than proxies.