14× Faster Embeddings: How We Rebuilt The ONNX Path In Manticore

TL;DR

Manticore has rebuilt its ONNX processing path, resulting in a 14× speed boost for generating embeddings. This development improves efficiency for AI workloads and highlights ongoing optimization efforts.

Manticore has implemented a major update to its ONNX processing pipeline, resulting in a 14-fold increase in embedding generation speed. This enhancement is confirmed by the company and is expected to significantly improve performance for large-scale AI applications using Manticore’s platform.

The update involved a complete redesign of the ONNX path within Manticore’s architecture. According to Manticore, the new implementation reduces latency and increases throughput, enabling faster embedding computations. The company claims that this improvement will benefit applications in search, recommendation systems, and other AI-driven services that rely heavily on embedding generation. Technical sources indicate that the overhaul included optimizing data flow, reducing bottlenecks, and leveraging more efficient hardware utilization. Manticore has not disclosed specific technical metrics beyond the 14× speed increase but emphasizes that the change is a core part of their ongoing performance enhancement strategy.

At a glance

updateWhen: announced March 2024

The developmentManticore has announced a significant overhaul of its ONNX integration, achieving a 14× increase in embedding generation speed.

Impact on AI Performance and Scalability

This development matters because it directly addresses the computational bottlenecks faced by AI applications that rely on embedding generation. A 14× speed increase means that large datasets can be processed more quickly, reducing latency and operational costs. For users, this translates into faster search results, more responsive recommendation engines, and improved scalability for AI services. It also demonstrates Manticore’s commitment to optimizing its platform to stay competitive in the rapidly evolving AI hardware and software landscape. Industry analysts suggest that such performance improvements could set new standards for open-source AI tools and accelerate adoption in enterprise settings.

Amazon

high performance AI embedding hardware

As an affiliate, we earn on qualifying purchases.

Background on Manticore and ONNX Optimization Efforts

Manticore is an open-source search and data analysis platform that integrates machine learning models, including embeddings, to enhance data retrieval and processing. Prior to this update, Manticore relied on ONNX (Open Neural Network Exchange) for model interoperability, but its performance was limited by the efficiency of its processing pipeline. Over the past year, there have been multiple industry-wide efforts to optimize ONNX-based workflows, driven by the growing demand for real-time AI inference. Manticore’s recent update aligns with these trends, aiming to improve throughput and reduce latency in embedding computations, which are critical for many AI applications. The company has previously announced incremental improvements, but the latest overhaul marks a significant leap forward.

“The redesign of our ONNX path has allowed us to achieve a 14× increase in embedding speed, marking a major milestone for our platform.”
— Manticore Engineering Team

SG Store 1pc I/O IO Shield Backplate GPU Bracket Graphics Card External Bracket Compatible with EVGA RTX 3060ti 3070 3080 3090 FTW3 XC3 Black Zinc

Compatibility: compatible with EVGA RTX 3060ti 3070 3080 3090 ftw3 xc3.

As an affiliate, we earn on qualifying purchases.

Technical Details and Broader Compatibility Still Unclear

While Manticore has confirmed the 14× speed increase, detailed technical explanations of how the optimization was achieved remain limited. It is also unclear whether this improvement applies universally across all models and hardware configurations or is specific to certain setups. Additionally, the long-term stability and compatibility of the new ONNX pipeline are still to be evaluated through broader testing.

ASUS ROG Astral GeForce RTX 5090 White OC Edition GPU, 32GB GDDR7, 3352 AI Tops, DLSS 4, 512-bit, DP 2.1b x3, HDMI 2.1b x2, AI Content Creation, LLM Inference, with GPU Holder

[3352 AI TOPS, 5th Gen Tensor Cores, AI Content Creation] Accelerate AI-powered photo and video workflows like upscaling,…

As an affiliate, we earn on qualifying purchases.

Next Steps Include Broader Testing and Community Adoption

Manticore plans to release detailed technical documentation and benchmarks in the coming weeks to validate and demonstrate the performance gains. The company also intends to gather feedback from early adopters and integrate further optimizations. Industry observers expect that this breakthrough could encourage other open-source projects to pursue similar enhancements, fostering a wave of performance improvements across AI tools.

Hewlett Packard Enterprise ProLiant DL325 Gen11 Rack Server w/one AMD EPYC 9354P Processor, 3.25GHz 32‑core 1P 64GB‑R MR408i‑o 8SFF 800W PS (HPE Smart Choice P72990-005)

HPE ProLiant DL325 Gen11 – P72990-005 – SMART CHOICE MODEL – HIGH PERFORMANCE FOR DATA-INTENSIVE WORKLOADS Preconfigured and…

As an affiliate, we earn on qualifying purchases.

Key Questions

What specific changes did Manticore make to achieve the speed increase?

Manticore has redesigned its ONNX processing pipeline, optimizing data flow, reducing bottlenecks, and leveraging hardware more efficiently. However, detailed technical specifics have not yet been publicly disclosed.

Does this speed improvement apply to all models and hardware?

It is not yet clear if the 14× speed boost applies universally across different models and hardware configurations. Further testing and detailed documentation are expected to clarify this.

How will this impact users of Manticore?

Users can expect faster embedding generation, leading to lower latency and operational costs, especially in large-scale AI applications like search and recommendation systems.

When will detailed benchmarks and technical explanations be available?

Manticore has announced plans to release further technical documentation and benchmarks in the coming weeks, which will provide more clarity on the performance gains.

Could this lead to broader industry changes?

Yes, if the performance improvements are validated and adopted widely, it could set new standards for open-source AI tools and influence future optimization efforts across the industry.

Source: hn

14× Faster Embeddings: How We Rebuilt The ONNX Path In Manticore

Author

Deep Intellica Team

Share article

Impact on AI Performance and Scalability

high performance AI embedding hardware

Background on Manticore and ONNX Optimization Efforts

SG Store 1pc I/O IO Shield Backplate GPU Bracket Graphics Card External Bracket Compatible with EVGA RTX 3060ti 3070 3080 3090 FTW3 XC3 Black Zinc

Technical Details and Broader Compatibility Still Unclear

ASUS ROG Astral GeForce RTX 5090 White OC Edition GPU, 32GB GDDR7, 3352 AI Tops, DLSS 4, 512-bit, DP 2.1b x3, HDMI 2.1b x2, AI Content Creation, LLM Inference, with GPU Holder

Next Steps Include Broader Testing and Community Adoption

Hewlett Packard Enterprise ProLiant DL325 Gen11 Rack Server w/one AMD EPYC 9354P Processor, 3.25GHz 32‑core 1P 64GB‑R MR408i‑o 8SFF 800W PS (HPE Smart Choice P72990-005)

Key Questions

What specific changes did Manticore make to achieve the speed increase?

Does this speed improvement apply to all models and hardware?

How will this impact users of Manticore?

When will detailed benchmarks and technical explanations be available?

Could this lead to broader industry changes?

Cutrova: Edit the Words, Not the Timeline

Process 4 Billion Pixels Per Second from 16 DIY Cameras for the Best V-Tubing Rig Ever

Ford rehires ‘gray beard’ engineers after AI falls short

In the Weights is your new AI-centric vanity search

OpenAI Woos Trump Administration as Investor

OpenAI proposes 5% stake to Trump administration to ease Washington pressure: Report

AI Changelog Digest For Open-source Maintainers

Will OpenAI Release GPT-5.6 Before Jul 7, 2026?

14× Faster Embeddings: How We Rebuilt The ONNX Path In Manticore

Author

Deep Intellica Team

Share article

Impact on AI Performance and Scalability

high performance AI embedding hardware

Background on Manticore and ONNX Optimization Efforts

SG Store 1pc I/O IO Shield Backplate GPU Bracket Graphics Card External Bracket Compatible with EVGA RTX 3060ti 3070 3080 3090 FTW3 XC3 Black Zinc

Technical Details and Broader Compatibility Still Unclear

ASUS ROG Astral GeForce RTX 5090 White OC Edition GPU, 32GB GDDR7, 3352 AI Tops, DLSS 4, 512-bit, DP 2.1b x3, HDMI 2.1b x2, AI Content Creation, LLM Inference, with GPU Holder

Next Steps Include Broader Testing and Community Adoption

Hewlett Packard Enterprise ProLiant DL325 Gen11 Rack Server w/one AMD EPYC 9354P Processor, 3.25GHz 32‑core 1P 64GB‑R MR408i‑o 8SFF 800W PS (HPE Smart Choice P72990-005)

Key Questions

What specific changes did Manticore make to achieve the speed increase?

Does this speed improvement apply to all models and hardware?

How will this impact users of Manticore?

When will detailed benchmarks and technical explanations be available?

Could this lead to broader industry changes?

You May Also Like