Publications - LiFi Research and Development Centre

Chen, Zheyu; Leung, Kin K.; Wang, Shiqiang; Tassiulas, Leandros; Chan, Kevin; Baker, Patrick J.

Multi-policy reinforcement learning for network resource allocation with periodic behaviors Journal Article

In: Computer Networks, vol. 272, pp. 111645, 2025, ISSN: 1389-1286.

Abstract | Links | BibTeX | Tags: deep reinforcement learning, Markov Decision Process, Network resource allocation, Reinforcement learning

@article{chen_multi-policy_2025,

title = {Multi-policy reinforcement learning for network resource allocation with periodic behaviors},

author = {Zheyu Chen and Kin K. Leung and Shiqiang Wang and Leandros Tassiulas and Kevin Chan and Patrick J. Baker},

url = {https://www.sciencedirect.com/science/article/pii/S1389128625006127},

doi = {10.1016/j.comnet.2025.111645},

issn = {1389-1286},

year  = {2025},

date = {2025-11-01},

urldate = {2025-10-08},

journal = {Computer Networks},

volume = {272},

pages = {111645},

abstract = {Markov Decision Processes (MDPs) serve as the mathematical foundation of Reinforcement learning (RL), where a Markov process with defined states is used to model the system and the actions to be taken affect the state transitions and the corresponding rewards. The RL and deep RL (DRL) can produce the high-performing action policy to maximize the long-term reward. Although RL/DRL have been widely applied to communication and computer systems, a key limitation is that the system under consideration often does not satisfy the required mathematical properties, thus making the MDP inexact and the derived policy flawed. Therefore, we consider the periodic Markov Decision Process (pMDP), where the evolution of the underlying process and model parameters for the pMDP demonstrate some forms of periodic characteristics (e.g., periodic job arrivals and available resources) which violate the Markov property. To obtain the optimal policies for the pMDP, a policy gradient method with a multi-policy solution framework is proposed, and a deep-learning method is developed to improve the effectiveness and stability of the proposed solution. Furthermore, a layer-sharing strategy is proposed to reduce the storage complexity by reducing the number of parameters in the neural networks. The deep-learning method is applied to achieve the near-optimal allocation of resources to arriving computational tasks in a network setting corresponding to the software-defined network (SDN). Evaluation results reveal that the proposed technique is valid and capable of outperforming a baseline method that employs a single policy by 31% on average.},

keywords = {deep reinforcement learning, Markov Decision Process, Network resource allocation, Reinforcement learning},

pubstate = {published},

tppubtype = {article}

}

Li, Haiyuan; Li, Peizheng; Assis, Karcuis; Ullauri, Juan Marcelo Parra; Aijaz, Adnan; Yan, Shuangyi; Simeonidou, Dimitra

NetMind+: Adaptive Baseband Function Placement with GCN Encoding and Incremental Maze-solving DRL for Dynamic and Heterogeneous RANs Journal Article

In: IEEE Transactions on Network and Service Management, vol. 22, no. 4, pp. 3419–3432, 2025, ISSN: 1932-4537.

Abstract | Links | BibTeX | Tags: Advanced RAN, deep reinforcement learning, graph neural network, Incremental learning, MEC, Topology variation

@article{li_netmind_2025,

title = {NetMind+: Adaptive Baseband Function Placement with GCN Encoding and Incremental Maze-solving DRL for Dynamic and Heterogeneous RANs},

author = {Haiyuan Li and Peizheng Li and Karcuis Assis and Juan Marcelo Parra Ullauri and Adnan Aijaz and Shuangyi Yan and Dimitra Simeonidou},

doi = {10.1109/TNSM.2025.3570490},

issn = {1932-4537},

year  = {2025},

date = {2025-08-01},

journal = {IEEE Transactions on Network and Service Management},

volume = {22},

number = {4},

pages = {3419–3432},

abstract = {The disaggregated architecture of advanced Radio Access Networks (RANs) with diverse X-haul latencies, in conjunction with resource-limited multi-access edge computing networks, presents significant challenges in designing a general model in placing baseband and user plane functions to accommodate versatile 5G services. This paper proposes a novel approach, NetMind+, which leverages Deep Reinforcement Learning (DRL) to determine the function placement strategies in diverse and evolving RAN topologies, aiming at minimizing power consumption. NetMind+ resolves the problem with a maze-solving strategy, enabling a Markov Decision Process with standardized action space scales across different networks. Additionally, a Graph Convolutional Network (GCN) based encoding and an incremental learning mechanism are introduced, allowing features from different and dynamic networks to be aggregated into a single DRL agent. This facilitates the generalization capability of DRL and minimizes the negative retraining impact. In an example with three sub-networks, NetMind+ demonstrates a substantial 32.76% improvement in power savings and a 41.67% increase in service stability compared to benchmarks from the existing literature. Compared to traditional methods necessitating a dedicated DRL agent for each network, NetMind+ attains comparable performance with 70% of the training cost savings. Furthermore, it demonstrates robust adaptability during network variations, accelerating training speed by 50%.},

keywords = {Advanced RAN, deep reinforcement learning, graph neural network, Incremental learning, MEC, Topology variation},

pubstate = {published},

tppubtype = {article}

}

Wang, Zhipeng; Ng, Soon Xin; El-Hajjar, Mohammed

A 3D Spatial Information Compression Based Deep Reinforcement Learning Technique for UAV Path Planning in Cluttered Environments Journal Article

In: IEEE Open Journal of Vehicular Technology, vol. 6, pp. 647–661, 2025, ISSN: 2644-1330.

Abstract | Links | BibTeX | Tags: 3D path planning, 3D spatial information compression, Autonomous aerial vehicles, Classification algorithms, Convergence, deep reinforcement learning, Navigation, Path planning, Principal component analysis, Search problems, Solid modeling, Three-dimensional displays, Training, training efficiency, unmanned aerial vehicles

@article{wang_3d_2025,

title = {A 3D Spatial Information Compression Based Deep Reinforcement Learning Technique for UAV Path Planning in Cluttered Environments},

author = {Zhipeng Wang and Soon Xin Ng and Mohammed El-Hajjar},

url = {https://ieeexplore.ieee.org/document/10878448},

doi = {10.1109/OJVT.2025.3540174},

issn = {2644-1330},

year  = {2025},

date = {2025-01-01},

urldate = {2025-10-08},

journal = {IEEE Open Journal of Vehicular Technology},

volume = {6},

pages = {647–661},

abstract = {Unmanned aerial vehicles (UAVs) can be considered in many applications, such as wireless communication, logistics transportation, agriculture and disaster prevention. The flexible maneuverability of UAVs also means that the UAV often operates in complex 3D environments, which requires efficient and reliable path planning system support. However, as a limited resource platform, the UAV systems cannot support highly complex path planning algorithms in lots of scenarios. In this paper, we propose a 3D spatial information compression (3DSIC) based deep reinforcement learning (DRL) algorithm for UAV path planning in cluttered 3D environments. Specifically, the proposed algorithm compresses the 3D spatial information to 2D through 3DSIC, and then combines the compressed 2D environment information with the current UAV layer spatial information to train UAV agents for path planning using neural networks. Additionally, the proposed 3DSIC is a plug and use module that can be combined with various DRL frameworks such as Deep Q-Network (DQN) and deep deterministic policy gradient (DDPG). Our simulation results show that the training efficiency of 3DSIC-DQN is 4.028 times higher than that directly implementing DQN in a 100 textbackslashtimes 100 textbackslashtimes 50 3D cluttered environment. Furthermore, the training efficiency of 3DSIC-DDPG is 3.9 times higher than the traditional DDPG in the same environment. Moreover, 3DSIC combined with fast recurrent stochastic value gradient (FRSVG), which can be considered as the most state-of-the-art DRL algorithm for UAV path planning, exhibits 2.35 times faster training speed compared with the original FRSVG algorithm.},

keywords = {3D path planning, 3D spatial information compression, Autonomous aerial vehicles, Classification algorithms, Convergence, deep reinforcement learning, Navigation, Path planning, Principal component analysis, Search problems, Solid modeling, Three-dimensional displays, Training, training efficiency, unmanned aerial vehicles},

pubstate = {published},

tppubtype = {article}

}

Qi, Jiaju; Lei, Lei; Jonsson, Thorsteinn; Hanzo, Lajos

Electric Bus Charging Schedules Relying on Real Data-Driven Targets Based on Hierarchical Deep Reinforcement Learning Journal Article

In: IEEE Access, vol. 13, pp. 99415–99433, 2025, ISSN: 2169-3536.

Abstract | Links | BibTeX | Tags: {>}Deep reinforcement learning, Batteries, charging control, Costs, deep reinforcement learning, electric bus, Electricity, hierarchical reinforcement learning, Real-time systems, Schedules, Scheduling, Stochastic processes, Uncertainty, Vehicle-to-grid

@article{qi_electric_2025,

title = {Electric Bus Charging Schedules Relying on Real Data-Driven Targets Based on Hierarchical Deep Reinforcement Learning},

author = {Jiaju Qi and Lei Lei and Thorsteinn Jonsson and Lajos Hanzo},

url = {https://ieeexplore.ieee.org/document/11006647},

doi = {10.1109/ACCESS.2025.3571211},

issn = {2169-3536},

year  = {2025},

date = {2025-01-01},

urldate = {2025-10-08},

journal = {IEEE Access},

volume = {13},

pages = {99415–99433},

abstract = {The charging scheduling problem of Electric Buses (EBs) is investigated based on Deep Reinforcement Learning (DRL). A Markov Decision Process (MDP) is conceived, where the time horizon includes multiple charging and operating periods in a day, while each period is further divided into multiple time steps. To overcome the challenge of long-range multi-phase planning with sparse reward, we conceive Hierarchical DRL (HDRL) for decoupling the original MDP into a high-level Semi-MDP (SMDP) and multiple low-level MDPs. The Hierarchical Double Deep Q-Network (HDDQN)-Hindsight Experience Replay (HER) algorithm is proposed for simultaneously solving the decision problems arising at different temporal resolutions. As a result, the high-level agent learns an effective policy for prescribing the charging targets for every charging period, while the low-level agent learns an optimal policy for setting the charging power of every time step within a single charging period, with the aim of minimizing the charging costs while meeting the charging target. It is proved that the flat policy constructed by superimposing the optimal high-level policy and the optimal low-level policy performs as well as the optimal policy of the original MDP. Since jointly learning both levels of policies is challenging due to the non-stationarity of the high-level agent and the sampling inefficiency of the low-level agent, we divide the joint learning process into two phases and exploit our new HER algorithm to manipulate the experience replay buffers for both levels of agents. Numerical experiments are performed with the aid of real-world data to evaluate the performance of the proposed algorithm.},

keywords = {{>}Deep reinforcement learning, Batteries, charging control, Costs, deep reinforcement learning, electric bus, Electricity, hierarchical reinforcement learning, Real-time systems, Schedules, Scheduling, Stochastic processes, Uncertainty, Vehicle-to-grid},

pubstate = {published},

tppubtype = {article}

}