The LLM as a Network Operator: A Vision for Generative AI in the 6G Radio Access Network

Abstract

The management of future AI-native Next-Generation (NextG) Radio Access Networks (RANs), including 6G and beyond, presents a challenge of immense complexity that exceeds the capabilities of traditional automation.

In response, we introduce the concept of the LLM-RAN Operator. In this paradigm, a Large Language Model (LLM) is embedded into the RAN control loop to translate high-level human intents into optimal network actions. Unlike prior empirical studies, we present a formal framework for an LLM-RAN operator that builds on earlier work by making guarantees checkable through an adapter aligned with the Open RAN (O-RAN) standard, separating strategic LLM-driven guidance in the Non-Real-Time (RT) RAN intelligent controller (RIC) from reactive execution in the Near-RT RIC, including a proposition on policy expressiveness and a theorem on convergence to stable fixed points.

By framing the problem with mathematical rigor, our work provides the analytical tools to reason about the feasibility and stability of AI-native RAN control. It identifies critical research challenges in safety, real-time performance, and physical-world grounding.

This paper aims to bridge the gap between AI theory and wireless systems engineering in the NextG era, aligning with the AI4NextG vision to develop knowledgeable, intent-driven wireless networks that integrate generative AI into the heart of the RAN.

Formalism of the LLM-RAN Operator State, Action, and Reward Spaces

These formalisms are essential for grounding the theoretical analysis in the practical realities of a wireless network environment.

The state \(s_{t} \in \mathcal{S}\) at time \(t\) must capture a snapshot of the entire RAN environment.

\(\mathcal{S} = \mathcal{H} \times \mathcal{Q} \times \mathcal{C} \times \mathcal{I}\)

Where \(\mathcal{H}\) is channel state space, \(\mathcal{Q}\) is the queueing state space, \(\mathcal{C}\) is the configuration space, and \(\mathcal{I}\) interference state space.

The action \(a_{t} \in \mathcal{A}\) is a structured, combinatorial command that modifies the network's configuration.

Lemma: Let \(U(s)\) be a utility function representing a network performance metric. Assume the LLM operator is designed to solve the single-step optimization problem \(a_t = \arg \max_{a \in \mathcal{A}}U(f_{\text{env}}(s_t, a))\). If a solution \(a_t\) exists such that \(U(s_{t + 1}) \geq U(s_t)\), then the sequence of states generated by the system is monotonically improving in utility. This property is observed in LLM-guided cases (e.g., decreasing transmit power raises efficiency).

Proof: We seek to prove that for any time step t, the utility of the next state, \(s_{t+1}\), is greater than or equal to the utility of the current state, \(s_t\). By the definition of the system's dynamics, the state at time \(t + 1\) is given by the application of the environment function to the current state \(s_t\) and the chosen action \(a_t\):

\(s_{t+1} = f_{\text{env}}(s_t, a_t)\)

The action space \(\mathcal{A}\) must contain, either explicitly or implicitly, a "do-nothing" or identity action, which we will denote as \(a_{\text{null}}\). This action is defined such that it does not change the state of the network. Therefore, applying the environment dynamics with this action yields the same state:

\(f_{\text{env}}(s_t, a_{\text{null}}) = s_t\)

According to the central assumption of the lemma, the action \(a_t\) is chosen to be the optimal action that maximizes the utility \(U\) of the resulting state. This means that \(a_t\) must yield a utility that is greater than or equal to the utility produced by any other possible action \(a' \in \mathcal{A}\).

\(U\left(f_{\text{env}}(s_t, a_t)\right) \geq U\left(f_{\text{env}}(s_t, a')\right) \quad \forall a' \in \mathcal{A}\)

This action \(a_{\text{null}}\) is a member of the set of all possible actions, \(\mathcal{A}\).

Since \(a_{\text{null}}\) is a member of \(\mathcal{A}\), the above inequality must also hold for \(a' = a_{\text{null}}\):

\(U(f_{\text{env}}(s_t, a_t)) \geq U(f_{\text{env}}(s_t, a_{\text{null}}))\)

By substituting the definitions from steps 1 and 2 into the inequality from step 4, we arrive at:

\(U(s_{t+1}) \geq U(s_t)\)

Since this holds for any arbitrary time step \(t\), the sequence of utilities (\(U(s_t)\)) is monotonically non-decreasing. QED

You can find more detailed formalism in the paper.

Related Works

There's a lot of excellent work that was very useful for the completion of this work.

ORANSight-2.0: Foundational LLMs for O-RAN is a good starting point. Other literatures can be found in our paper.

BibTeX

@article{llm-ran,
  author    = {Oluwaseyi, Giwa and Michael, Adewole and Tobi, Awodumila and Pelumi, Aderinto},
  title     = {The LLM as a Network Operator: A Vision for Generative AI in the 6G 
    Radio Access Network},
  journal   = {NeurIPS 2025 Workshop on AI and ML for Next-Generation Wireless Communications and Networking},
  year      = {2025},
}

The LLM as a Network Operator: A Vision for Generative AI in the 6G Radio Access Network

A diagram adapted from Rathakrishnan et al. for the evolution of mobile networks and RAN architectures with increasing AI integration for control functions.

Diagram of our proposed framework starting with the instruction from human operator through the non real-time RAN intelligent controller (RIC) and near real-time RIC to the RAN elements.

Abstract

Formalism of the LLM-RAN Operator State, Action, and Reward Spaces

Related Works

BibTeX