Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks

Abstract

Efficient spectrum allocation is vital for 5G/6G networks, yet traditional deep reinforcement learning (DRL) methods suffer from high sample complexity and unsafe exploration that can disrupt network stability. To address these challenges, we propose a meta-learning framework that learns a robust initial policy capable of rapid and safe adaptation to changing wireless conditions. We implement three meta-learning architectures using model-agnostic techniques---model-agnostic meta-learning (MAML), recurrent neural network (RNN), and RNN with a self-attention mechanism---and compare them against a DRL baseline and classical heuristic approaches in a dynamic integrated access/backhaul (IAB) environment. The attention-based agent achieves a peak throughput of \(\approx49\)~Mbps, reducing SINR and latency violations by over \(60\%\) relative to PPO, and attains \(97\%\) of the fairness level of the exhaustive-search upper bound. These results demonstrate that meta-learning enables data-efficient, reliable, and scalable spectrum management for next-generation wireless systems.

Results

1 / 4

Fairness Index.

2 / 4

Mean Latency Violations.

3 / 4

Mean SINR Violations.

4 / 4

Network Throughput.

Related Works

A lot of excellent work was very useful for completing this work.

Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks was the basis of our architecture.

Imitate the Good and Avoid the Bad: An Incremental Approach to Safe Reinforcement Learning is another excellent literature.

BibTeX

@article{oluwaseyi2025,
  author    = {Oluwaseyi, Giwa and Tobi, Ebenezer Awodumila and Muhammad, Ahmed Mohsin and Ahsan, Bilal and Muhammad, Ali Jamshed},
  title     = {Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks},
  journal   = {IEEE Wireless Communications Letters},
  year      = {2026}
}

Meta-Reinforcement Learning for Fast and Data-Efficient Spectrum Allocation in Dynamic Wireless Networks

Meta reinforcement learning (RL) ensures faster adaptation of an RL agent in an environment by learning the initial policy and quickly adapting. In a wireless environment, meta RL captures the dynamic nature of components that are suitable for the safe exploration of the agent.

Abstract

Results

Related Works

BibTeX