Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks

1African Institute for Mathematical Sciences, 2Stanford University 3University of Oklahoma 4University of Glasgow

Meta reinforcement learning (RL) ensures faster adaptation of an RL agent in an environment by learning the initial policy and quickly adapting. In a wireless environment, meta RL captures the dynamic nature of components that are suitable for the safe exploration of the agent.

Abstract

The dynamic spectrum allocation in 5G / 6G networks is critical to efficient resource utilization. However, applying traditional deep reinforcement learning (DRL) is often infeasible due to its immense sample complexity and the safety risks associated with unguided exploration, which can cause severe network interference.

To address these challenges, we propose a meta-learning framework that enables agents to learn a robust initial policy and rapidly adapt to new wireless scenarios with minimal data. We implement three meta-learning architectures— model-agnostic meta-learning (MAML), recurrent neural network (RNN), and an attention-enhanced RNN— and evaluate them against a non-meta-learning DRL algorithm, proximal policy optimization (PPO) baseline, in a simulated dynamic integrated access/backhaul (IAB) environment. Our results show a clear performance gap. The attention-based meta-learning agent reaches a peak mean network throughput of \(\approx 48\) Mbps, while the PPO baseline decreased drastically to \(10\) Mbps.

Furthermore, our method reduces SINR and latency violations by more than \(50\%\) compared to PPO. It also shows quick adaptation, with a fairness index \(\geq 0.7\), showing better resource allocation. This work proves that meta-learning is a very effective and safer option for intelligent control in complex wireless systems.

Results

1 / 5
Fairness Index.
2 / 5
Mean Latency Violations.
3 / 5
Mean Episodic Reward.
4 / 5
Mean SINR Violations.
5 / 5
Network Throughput.

Related Works

A lot of excellent work was very useful for completing this work.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks was the basis of our architecture.

Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning is another excellent literature.

BibTeX

@article{oluwaseyi2025,
  author    = {Oluwaseyi, Giwa and Tobi, Awodunmila and Muhammad, Ahmed Mohsin and Ahsan, Bilal and Muhammad, Ali Jamshed},
  title     = {Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks},
  journal   = {IEEE Wireless Communications Letters},
  year      = {2025},
}