Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation

1African Institute for Mathematical Sciences, 2Stanford University 3University of Glasgow

QPPG leverages quantum Fisher information to preconditon gradient updates achieving faster convergence compared to classical reinforcement learning.

The animated video does not accurately represent the work, we found them cool and decided to include them. It was generated using Sora 😎

Abstract

Reinforcement learning (RL) in quantum control faces unique challenges due to measurement-induced stochasticity and environmental noise.

Classical policy-gradient methods often converge slowly under high variance, while value-based quantum variants may require extensive experience replay. We introduce Quantum–Preconditioned Policy Gradients (QPPG), a natural‐gradient–inspired algorithm that whitens policy updates using the full inverse quantum Fisher information (QFI) with Tikhonov regularization. QPPG bridges classical and quantum geometry by reducing to ideal quantum natural gradients as the regularizer vanishes yet remains stable under noisy QFI estimates. We benchmark QPPG against Classical REINFORCE (CPG), a block-diagonal quantum natural policy gradient (QNPG), and a Quantum Deep Q-Network (Q-DQN) in both a two-state classical environment and a custom single-qubit Gym task with depolarizing, amplitude-damping, and dephasing noise.

Across five random seeds, QPPG achieves 90% of its maximum return in only (\(100 \pm 12\)) episodes—over four times faster than CPG (\(420 \pm 5\)) and nearly three times faster than Q‐DQN (\(278 \pm 35\)). Under elevated noise (15%), QPPG attains a 90% success rate, surpassing CPG (80%), QNPG (85%), and Q‐DQN (78%). These results demonstrate that full inverse‐QFI preconditioning delivers both rapid, stable learning and strong noise resilience, paving the way for scalable quantum RL in larger Hilbert spaces.

Connection to Wireless Networks

Our QPPG algorithm can be adopted for link adaptation in a noisy fading wireless channel.

Rayleigh Fading Channel

We simulate a single-user Rayleigh-fading channel with 4 receiving antennas and pilot signal-to-noise ratio(SNR). The agent selects modulation and transmit power to maximize successful throughput.

Results

1 / 3
Result from our experiments.
2 / 3
Result from our experiments.
3 / 3
Result from our experiments.

Related Works

There's a lot of excellent work that was very useful for the completion of this work.

Quantum Natural Gradient leveraged QFI to accelerate convergence in noiseless bandit settings.

Simple Statistical Gradient-following Algorithms for Connectionist Reinforcement Learning estimated the gradient of expected return via Monte Carlo sampling.

BibTeX

@article{oluwaseyi2025,
  author    = {Oluwaseyi, Giwa and Muhammad, Ahmed Mohsin and Muhammad, Ali Jamshed},
  title     = {Quantum Fisher-Preconditioned Reinforcement Learning: From Single-Qubit Control to Rayleigh-Fading Link Adaptation},
  journal   = {IEEE Communications Letters},
  year      = {2025},
}