Benchmarking quantum computers with any quantum algorithm

中文速览

本文提出了一种名为“子电路体积基准测试”（Subcircuit Volumetric Benchmarking, SVB）的创新方法，旨在解决评估当前量子计算机执行未来大规模、实用级量子算法能力的难题。由于现有硬件的规模和噪声水平有限，无法直接运行这些庞大的“目标”算法，因此难以衡量技术进展。SVB方法的核心思想是，从一个编译好的、任意大的目标算法电路中“剪切”出许多不同宽度（量子比特数）和深度（门层数）的小型子电路片段。随后，在实际的量子硬件上运行这些可管理的片段，并高效地测量它们的执行质量（具体为过程保真度）。通过分析这些片段的性能如何随其尺寸变化，该方法不仅能直观地展示设备的性能瓶颈，还能外推出整个目标电路的预期保真度，并最终计算出一个简洁的“能力系数”，用以量化当前系统距离成功执行目标算法还有多远。该方法具有可扩展性，能够为追踪量子实用性的进展提供一个稳定且有针对性的衡量标准。

English Research Briefing

Research Briefing: Benchmarking quantum computers with any quantum algorithm

1. The Core Contribution

This paper introduces Subcircuit Volumetric Benchmarking (SVB), a novel and scalable method for assessing a quantum computer’s performance on any target quantum algorithm, even those far too large to run on current hardware. The central thesis is that by systematically “snipping” small, executable subcircuits from a utility-scale target circuit and measuring their process fidelity, one can realistically predict the performance on the full circuit and track progress toward quantum utility. The primary conclusion, demonstrated on IBM Q systems, is that this method is not only practical but also reveals crucial performance limitations missed by simpler benchmarks. Specifically, it shows that optimistic fidelity predictions based on small-scale (e.g., 2-qubit) tests are misleading, with realistic performance on wider circuits being orders of magnitude worse due to the severe impact of crosstalk and other correlated errors. SVB distills this complex performance into a single, intuitive capability coefficient that quantifies how close a system is to successfully executing a given large-scale application.

2. Research Problem & Context

The paper addresses a critical gap in quantum computing: the lack of a scalable and application-relevant benchmarking framework for tracking progress toward utility-scale computation. The state of the art faces a dilemma. On one hand, component-level benchmarks like Randomized Benchmarking (RB) measure individual gate fidelities but fail to capture context-dependent errors (e.g., crosstalk, coherent error interference) and thus cannot reliably predict the performance of a full algorithm. On the other hand, existing application-based benchmarks, such as those discussed in the “Quantum Simulation of Molecular Dynamics Processes” study, are limited to running small, classically-tractable problem instances. These “toy” problems may not be representative of the challenges posed by utility-scale algorithms, and this approach is fundamentally unscalable, as it cannot be applied to algorithms that require more qubits than are available or whose ideal outcomes cannot be efficiently simulated classically. SVB is designed to bridge this gap by providing a method that is both application-specific, testing performance on snippets of a real, large-scale algorithm, and scalable, as it does not require running or simulating the full target circuit. It provides a stable yardstick to measure hardware improvements against a fixed, ambitious computational challenge, a capability missing from current benchmarking suites.

3. Core Concepts Explained

The paper’s argument rests on two central concepts: the SVB protocol itself and the metrics used to summarize its results.

Concept 1: Subcircuit Volumetric Benchmarking (SVB)

Precise Definition: SVB is a protocol that begins with a large, fully compiled target circuit, \(c\). It then proceeds by (1) sampling a set of smaller subcircuits, or “snippets,” \(\{c_{w_i, d_i, j}\}\) of various widths \(w_i\) and depths \(d_i\) from \(c\); (2) experimentally estimating a quality metric, typically the process fidelity, for each snippet on a given quantum computer; and (3) analyzing the fidelity as a function of snippet shape to assess the computer’s capability to execute the original target circuit \(c\).
Intuitive Explanation: Imagine you need to assess if a new marathon runner is ready for the Boston Marathon. Instead of having them run the full, grueling race (which they might not finish), you have them run various high-intensity segments of the actual race course—for instance, a flat 5k, a hilly 10k, and so on. By measuring their performance and fatigue on these representative “snippets,” you can build a highly predictive model of how they would perform in the full marathon. In this analogy, the quantum computer is the runner, the utility-scale algorithm is the Boston Marathon, and the subcircuit snippets are the various race segments.
Why this is critical: SVB is the core methodological innovation that solves the paper’s central problem. It provides a direct link between the performance of a near-term device and its potential to run a far-term application. This allows for consistent, long-term tracking of progress against a fixed “grand challenge” problem, something previously impossible. It moves benchmarking away from abstract, random circuits toward the actual patterns of computation required by useful algorithms.

Concept 2: Effective Error per Quop (\(\epsilon_{w,d}\)) and Capability Coefficient

Precise Definition: The effective error per quop, \(\epsilon_{w,d}\), is a context-dependent error rate derived from the geometric mean (\(\text{GM}\)) of process fidelities of snippets with shape \((w, d)\), given by the formula \(\epsilon_{w,d} = 1 - \text{GM}[F_{w,d}]^{1/(wd)}\). The capability coefficient is the ratio of the system’s measured observed quops (\(Q_C = 1/\epsilon_{w_{max}}\), the number of operations it can perform in the context of the widest snippets) to the total quops required by the full target circuit (\(Q_T = w_c d_c\)).
Intuitive Explanation: Think of \(\epsilon_{w,d}\) as an “in-traffic” fuel efficiency rating for a car, as opposed to the idealized highway rating on the sticker. It measures the actual error rate of quantum operations when they are running in a “congested” environment with \(w\) qubits operating simultaneously. The capability coefficient is the final score: “This quantum computer can successfully complete X% of the journey required by this massive computation before it runs out of ‘coherence fuel’.”
Why this is critical: These metrics transform the raw, multi-dimensional SVB data into actionable insights. The dependence of \(\epsilon_{w,d}\) on width \(w\) directly quantifies the detrimental impact of crosstalk, a key performance limiter. The capability coefficient provides a single, powerful figure of merit that allows for straightforward comparison across different machines, compilation strategies, and time, summarizing a system’s readiness for a specific, practical task.

4. Methodology & Innovation

The methodology integrates algorithm compilation, randomized sub-sampling, and efficient fidelity estimation. The process involves: (1) selecting a target application (a block-encoding subroutine for quantum chemistry) and compiling it for a specific IBM Q device to create a large target circuit \(c\); (2) implementing a “snipping” algorithm to randomly sample connected subcircuits of varying widths (\(w\)) and depths (\(d\)) from \(c\); (3) executing these snippets on the hardware and measuring their process fidelity using Mirror Circuit Fidelity Estimation (MCFE), a scalable protocol; and (4) analyzing the collected fidelities to compute the effective error rates and extrapolate the full circuit’s performance.

The fundamental innovation is the synthesis of application-specific sub-sampling with predictive, volumetric analysis. Prior work either benchmarks entire (but small) applications or characterizes isolated gate-level components. SVB is the first method to benchmark fragments of a large, meaningful application. This is a crucial distinction: it tests the hardware not on arbitrary random circuits (like RB) but on the specific gate sequences and structures that appear in a target utility-scale algorithm. The extrapolation based on the multiplicative fidelity model, \(\hat{F}_c = \text{GM}[F_{w,d}]^{w_c d_c / (wd)}\), is a key part of this innovation, providing a principled way to make a concrete, quantitative prediction about a currently-intractable computation, moving far beyond the simple heuristic of summing individual gate errors.

5. Key Results & Evidence

The paper presents several critical findings, substantiated by experimental data from IBM Q systems.

Massive performance gap for target circuits: The data clearly shows that current systems are far from being able to run the target LCU subroutines with any reasonable success probability. Figures 2 and 3 provide visual evidence, with volumetric plots showing measured fidelities (data points) decaying rapidly with width and depth, falling far short of the target circuit’s complexity (yellow stars).
Crosstalk and correlated errors dominate performance scaling: The most significant finding is that errors are highly context-dependent. Figure 5 demonstrates this powerfully by showing that the effective error per quop (\(\epsilon_{w,d}\)) increases substantially with circuit width (\(w\)). On IBM Q Sherbrooke, for instance, this error rate increases by nearly an order of magnitude from \(w=2\) to the maximum width tested. This proves that simple error models based on isolated gate performance are inadequate.
Narrow-circuit predictions are wildly optimistic: A direct consequence of the above is that predictions from small-scale tests are misleading. Table I quantifies this by comparing fidelity predictions from narrow snippets (\(F_0\)) versus wide snippets (\(F\)). For the H2 Tapered circuit on Sherbrooke, the optimistic prediction is \(F_0 \approx 0.40\), while the more realistic prediction is a staggering \(F \approx 10^{-9}\), highlighting the failure of simplistic extrapolation.
SVB provides concise and quantifiable performance metrics: The methodology successfully distills complex performance data into simple metrics. Table II shows the capability coefficient for each experiment, ranging from a mere 0.03% to 5%. This single number effectively captures the performance gap. The scalability coefficient (\(Q_C/Q_0\)), which quantifies the performance drop-off with width, is shown to be as low as 4%, starkly illustrating the impact of correlated errors.

6. Significance & Implications

The findings have significant consequences for both the academic field and the practical development of quantum computers.

For the Field: SVB establishes a more rigorous and meaningful paradigm for application-oriented benchmarking. It shifts the goal from vaguely demonstrating “quantumness” to quantitatively tracking progress toward a specific, utility-scale computational goal. Furthermore, it provides an invaluable experimental tool for studying the physics of complex, correlated noise models in an application-relevant context, enabling researchers to better understand and mitigate the primary obstacles to scaled-up performance.
For Practical Applications: The method provides quantum hardware developers with clear, actionable targets. Instead of chasing generic gate fidelity improvements, they can now focus on improving the capability coefficient for specific grand-challenge problems relevant to customers in chemistry, finance, or materials science. For algorithm designers and end-users, SVB and its derived metrics offer a realistic way to estimate the true resource requirements and performance of algorithms on near-term hardware, fostering more effective hardware-software co-design and providing a transparent metric for assessing the return on investment in quantum hardware.

7. Open Problems & Critical Assessment

This section outlines future research directions, both those stated by the authors and those arising from a critical analysis of the work.

1. Author-Stated Future Work:

To develop and evaluate alternative circuit snipping algorithms that are better suited for quantum computers with different or higher-connectivity topologies, where the current approach of selecting “connected” subsets may be less optimal.
To adapt the SVB framework for benchmarking fault-tolerant quantum computers. This would require new methods for snipping logical circuits and interpreting the behavior of logical-level errors, which are fundamentally different from physical errors.
To use SVB as a comprehensive framework to enable a direct, quantitative comparison between NISQ-era and fault-tolerant (FTQC) implementations for solving the same computational problem, providing insight into when the overhead of quantum error correction becomes beneficial.

2. AI-Proposed Open Problems & Critique:

Proposed Open Problems:
1. Benchmarking Dynamic Circuits: The current SVB protocol is designed for static circuits. A key open direction is to extend SVB to benchmark dynamic circuits involving mid-circuit measurement and classical feed-forward, which are essential for many advanced algorithms and error correction schemes.
2. Compiler-Integrated SVB: The paper treats the compiled circuit as a fixed input. A promising avenue is to create a feedback loop where SVB data on effective error rates (\(\epsilon_{w,d}\)) is used to guide a noise-aware compiler, helping it find qubit mappings and gate schedules that minimize these application-specific error rates.
3. Theoretical Guarantees for Extrapolation: The paper’s extrapolation method is based on a physically motivated conjecture that process fidelities of large, disjoint circuit blocks combine multiplicatively. A crucial open problem is to formalize the theoretical underpinnings of this conjecture. Establishing rigorous bounds or convergence guarantees for the fidelity prediction would significantly strengthen the confidence in SVB’s results.
Critical Assessment:
1. Unstated Assumption in Snipping: The methodology’s choice to discard multi-qubit gates that cross the snippet boundary is a potential methodological weakness. For algorithms that rely heavily on long-range interactions, this procedure could systematically remove the most challenging parts of the circuit, particularly those most susceptible to long-range crosstalk. This could lead to an overly optimistic assessment, as the benchmark may not be probing the true performance bottlenecks.
2. Dependence on Fidelity Estimation Protocol: The paper uses Mirror Circuit Fidelity Estimation (MCFE). While scalable, the accuracy and biases of the chosen fidelity estimation protocol are an implicit assumption. The final results, especially for very noisy snippets, could be sensitive to this choice. The work would be strengthened by an analysis of how the capability coefficient changes if a different protocol (e.g., direct randomized benchmarking on the snippets) were used, which might be sensitive to different aspects of the noise.
3. Risk of a Single-Metric Focus: By distilling performance into a single “capability coefficient,” the framework risks oversimplifying the definition of a “good” quantum computer. While powerful for tracking progress, an exclusive focus on this metric could incentivize the optimization of hardware for process fidelity at the expense of other critical factors, such as execution latency, compilation time, or the ability to handle dynamic circuits, which are not captured by SVB.

中文速览#

English Research Briefing#

Research Briefing: Benchmarking quantum computers with any quantum algorithm#

1. The Core Contribution#

2. Research Problem & Context#

3. Core Concepts Explained#

Concept 1: Subcircuit Volumetric Benchmarking (SVB)#

Concept 2: Effective Error per Quop (\(\epsilon_{w,d}\)) and Capability Coefficient#

4. Methodology & Innovation#

5. Key Results & Evidence#

6. Significance & Implications#

7. Open Problems & Critical Assessment#