Book Description
Advancements in robotics, computer vision, machine learning and hardware have contributed to impressive developments of autonomous vehicles. However, there still exist challenges that must be tackled in order for the autonomous vehicles to be safely and seamlessly integrated into human environments. This is particularly the case in dense and cluttered urban settings. Autonomous vehicles must be able to understand and anticipate how their surroundings will evolve in both time and space. This capability will allow the autonomous vehicles to proactively plan safe trajectories and avoid other traffic agents. A common prediction approach is an agent-centric method (e.g., pedestrian or vehicle trajectory prediction). These methods require detection and tracking of all agents in the environment since trajectory prediction is performed on each agent. An alternative approach is a map-based (e.g., occupancy grid map) prediction method where the entire environment is discretized into grid cells and the collective occupancy probabilities for each grid cell are predicted. Hence, object detection and tracking capability is generally not needed. This makes a map-based occupancy prediction approach more robust to partial object occlusions and is capable of handling any arbitrary number of agents in the environments. However, a common problem with occupancy grid map prediction is the vanishing of objects from the predictions, especially at longer time horizons. In this thesis, we consider the problem of spatiotemporal environment prediction in urban environments. We merge tools from robotics, computer vision and deep learning to develop spatiotemporal occupancy prediction frameworks that leverage environment information. In our first research work, we developed an occupancy prediction methodology that leverages environment dynamic information, in terms of static-dynamic parts of the environment. Our model learns to predict the spatiotemporal evolution of the static and dynamic parts of the environment input separately and outputs the final occupancy grid map predictions of the entire environment. In our second research work, we further developed the prediction framework to be modular, by adding a learning-based static-dynamic segmentation module upstream of the occupancy prediction module. The addition addressed previous limitations that require the static and dynamic parts of the environment to be known in advance. Lastly, we developed an environment prediction framework that leverages environment semantic information. Our proposed model consists of two sub-modules, which are future semantic segmentation prediction and occupancy prediction. We proposed to represent environment semantics in the form of semantic gird maps that are similar to the occupancy grid representation. This allows a direct flow of semantic information to the occupancy prediction sub-module. Experiments validated on the real-world driving dataset show that our methods outperform other state-of-the-art models and reduce the issue of vanishing object in the predictions at longer time horizons.