Book Description
This dissertation presents a distributed platoon control strategy of connected and automated vehicles (CAVs) based on physics-informed Deep Reinforcement Learning (DRL) for mixed traffic of CAVs and human-driven vehicles (HDVs). The dissertation will mainly consist of three parts: (i) generic DRL-based CAV control framework for the mixed traffic flow; (ii) DRL-based CAV distributed control under communication failure for the fully connected automated environment; (iii) distributed CAVs control for the mixed traffic flow, under real-time aggregated macroscopic car-following behavior estimation based on DRL. For the first part, we first discussed the current challenges for CAV control in mixed traffic flow. For distributed CAV control, we categorize the local downstream environment into two broad traffic scenarios based on the composition of CAVs and HDVs to accommodate any possible CAV-HDV platoon configuration: (i) a fully connected automated environment, where all local downstream vehicles are CAVs, forming a CAV-CAVs topology; and (ii) a mixed local downstream environment, comprising the closest downstream CAV followed by one or more HDVs, creating a CAV-HDVs-CAV topology. This generic control framework effectively accommodates any CAV-HDV platoon topology that may emerge within the mixed traffic platoon. This part is discussed in Section 3. For the second part, this study proposes a deep reinforcement learning (DRL) based distributed longitudinal control strategy for connected and automated vehicles (CAVs) under communication failure to stabilize traffic oscillations. Specifically, the Signal-Interference-plus-Noise Ratio (SINR) based vehicle-to-vehicle (V2V) communication is incorporated into the DRL training environment to reproduce the realistic communication and time-space varying information flow topologies (IFTs). A dynamic information fusion mechanism is designed to smooth the high-jerk control signal caused by the dynamic IFTs. Based on that, each CAV controlled by the DRL-based agent was developed to receive the real-time downstream CAVs' state information and take longitudinal actions to achieve the equilibrium consensus in the multi-agent system. Simulated experiments are conducted to tune the communication adjustment mechanism and further validate the control performance, oscillation dampening performance and generalization capability of our proposed algorithm. This part is discussed in Section 4. The third part proposes an innovative distributed longitudinal control strategy for connected automated vehicles (CAVs) in the mixed traffic environment of CAV and human-driven vehicles (HDVs), incorporating high-dimensional platoon information. For mixed traffic, the traditional CAV control method focuses on microscopic trajectory information, which may not be efficient in handling the HDV stochasticity (e.g., long reaction time; various driving styles) and mixed traffic heterogeneities. Different from traditional methods, our method, for the first time, characterizes consecutive HDVs as a whole (i.e., AHDV) to reduce the HDV stochasticity and utilize its macroscopic features to control the following CAVs. The new control strategy takes advantage of platoon information to anticipate the disturbances and traffic features induced downstream under mixed traffic scenarios and greatly outperforms the traditional methods. In particular, the control algorithm is based on deep reinforcement learning (DRL) to fulfill car-following control efficiency and further address the stochasticity for the aggregated car following behavior by embedding it in the training environment. To better utilize the macroscopic traffic features, a general platoon of mixed traffic is categorized as a CAV-HDVs-CAV pattern and described by corresponding DRL states. The macroscopic traffic flow properties are built upon the Newell car-following model to capture the characteristics of aggregated HDVs' joint behaviors. Simulated experiments are conducted to validate our proposed strategy. The results demonstrate that the proposed control method has outstanding performances in terms of oscillation dampening, eco-driving, and generalization capability. This part is discussed in Section 5.