TL;DR
Scientists have built a programmable probabilistic computer comprising one million p-bits by connecting multiple FPGAs. This system performs Gibbs sampling at over a trillion flips per second, surpassing previous single-chip limits. The development offers new possibilities for large-scale stochastic computing.
Researchers have developed a scalable, programmable probabilistic computer with one million p-bits by networking FPGAs, breaking previous size and capacity limits. This new architecture performs Gibbs sampling at over a trillion flips per second while maintaining local memory for coupling weights, marking a significant advance in hardware for stochastic computation and optimization problems.
The system was built by connecting multiple FPGAs into a single Ising machine, allowing it to handle larger problem sizes than any single device could. It exchanges only 1-bit boundary states during operation, which raises questions about the frequency of boundary state refreshes needed for accurate sampling.
Experiments using three-dimensional Edwards-Anderson spin glasses demonstrated that the machine’s performance depends on a single timing ratio, eta, which compares communication frequency to local p-bit update frequency. When eta exceeds a topology-dependent threshold, the distributed machine’s results match those of a monolithic GPU-based system. Below this threshold, residual energy decays more slowly, indicating a tradeoff between throughput and accuracy.
A theoretical model supports these findings, suggesting that the observed tradeoff is a universal feature of partitioned stochastic systems. The platform has been tested on various problems, including spin glasses, Max-Cut, and Boolean satisfiability, illustrating its versatility and scalability.
Implications of Large-Scale Probabilistic Computing
This development demonstrates a pathway toward scalable hardware for complex sampling and optimization tasks, which are central to fields like machine learning, physics simulation, and combinatorial optimization. By enabling a programmable system with one million p-bits, researchers can now explore larger problem spaces with higher speed and efficiency.
The architecture’s ability to operate with local memory and minimal boundary communication suggests a new paradigm for distributed stochastic computing, potentially influencing future hardware designs for AI and scientific computing. The identified tradeoff between speed and accuracy provides a practical guideline for system scaling and performance tuning.

Xilinx Artix-7 FPGA M.2 Development Board (A100T FPGA/512MB DDR)
Xilinx XC7A100T-L2FGG484E FPGA
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Advances in Probabilistic Hardware and Scaling Limits
Prior to this work, probabilistic computers built from p-bits were confined to single-chip implementations, limiting their capacity and speed. These systems have been proposed as hardware accelerators for sampling and optimization problems, but scaling beyond a few thousand p-bits was challenging due to memory bandwidth and communication constraints.
The recent effort to network FPGAs into a distributed Ising machine extends the size and capability of probabilistic hardware significantly. The concept of partitioning the problem and exchanging only boundary states had been theoretically proposed, but this is the first practical demonstration of a system with one million p-bits. The research builds on earlier studies of spin glasses and stochastic sampling, now pushing the boundary toward larger, more versatile hardware platforms.
“This architecture opens doors to large-scale probabilistic computing that was previously impossible due to hardware limitations.”
— an anonymous researcher

Graph Colouring and the Probabilistic Method by Michael Molloy (2001-12-06)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Unresolved Questions About System Performance
It is still unclear how the system performs on a broader range of real-world problems beyond the tested spin glasses, Max-Cut, and Boolean satisfiability. The long-term stability, energy efficiency, and potential for further scaling remain to be evaluated. Additionally, the precise impact of communication latency and boundary refresh rates on accuracy across different topologies requires further study.

InnoMaker USB Logic Analyzer LA5032 32 Channels 500MHz 10G Samples for MCU ARM FPGA Debug Tool MIPI Analyzer with English Software Compatible with Windows Mac Linux
【High-Speed 32-Channel Analysis】The LA5032 USB Logic Analyzer captures complex signals with 32 synchronous channels at 500MHz sampling rate,…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Future Directions for Probabilistic Hardware Scaling
Researchers are expected to explore larger networks and more complex problem instances, as well as optimize boundary communication protocols. Further experiments will likely focus on real-world applications in machine learning and physics simulation. Development of dedicated hardware implementations and integration with existing computing infrastructure are also anticipated to advance the practical deployment of this technology.

Engineering Real-Time Data Acquisition: Sensors, FPGA Processing, and Scalable Telemetry Systems (Aerospace Interface Standards Series)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
What is a p-bit?
A p-bit is a probabilistic bit that can stochastically fluctuate between 0 and 1, enabling hardware-based sampling for probabilistic algorithms.
Why is scaling to one million p-bits significant?
Scaling to one million p-bits allows for handling larger, more complex problems in sampling and optimization, surpassing previous hardware capacity limits.
How does boundary communication affect system accuracy?
The frequency of boundary state exchanges, measured by the timing ratio eta, influences whether the distributed system’s results match those of a monolithic system. Higher exchange rates improve accuracy but may reduce throughput.
What applications could benefit from this development?
Applications include physics simulations, combinatorial optimization, machine learning, and other areas requiring large-scale probabilistic sampling.
Are there limitations to this approach?
Yes, challenges remain in scaling further, managing communication latency, and ensuring stability across diverse problem types. Further research is needed to address these issues.
Source: Hacker News