Accelerating Monte Carlo methods for Bayesian inference in dynamical models


Book Description

Making decisions and predictions from noisy observations are two important and challenging problems in many areas of society. Some examples of applications are recommendation systems for online shopping and streaming services, connecting genes with certain diseases and modelling climate change. In this thesis, we make use of Bayesian statistics to construct probabilistic models given prior information and historical data, which can be used for decision support and predictions. The main obstacle with this approach is that it often results in mathematical problems lacking analytical solutions. To cope with this, we make use of statistical simulation algorithms known as Monte Carlo methods to approximate the intractable solution. These methods enjoy well-understood statistical properties but are often computational prohibitive to employ. The main contribution of this thesis is the exploration of different strategies for accelerating inference methods based on sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC). That is, strategies for reducing the computational effort while keeping or improving the accuracy. A major part of the thesis is devoted to proposing such strategies for the MCMC method known as the particle Metropolis-Hastings (PMH) algorithm. We investigate two strategies: (i) introducing estimates of the gradient and Hessian of the target to better tailor the algorithm to the problem and (ii) introducing a positive correlation between the point-wise estimates of the target. Furthermore, we propose an algorithm based on the combination of SMC and Gaussian process optimisation, which can provide reasonable estimates of the posterior but with a significant decrease in computational effort compared with PMH. Moreover, we explore the use of sparseness priors for approximate inference in over-parametrised mixed effects models and autoregressive processes. This can potentially be a practical strategy for inference in the big data era. Finally, we propose a general method for increasing the accuracy of the parameter estimates in non-linear state space models by applying a designed input signal. Borde Riksbanken höja eller sänka reporäntan vid sitt nästa möte för att nå inflationsmålet? Vilka gener är förknippade med en viss sjukdom? Hur kan Netflix och Spotify veta vilka filmer och vilken musik som jag vill lyssna på härnäst? Dessa tre problem är exempel på frågor där statistiska modeller kan vara användbara för att ge hjälp och underlag för beslut. Statistiska modeller kombinerar teoretisk kunskap om exempelvis det svenska ekonomiska systemet med historisk data för att ge prognoser av framtida skeenden. Dessa prognoser kan sedan användas för att utvärdera exempelvis vad som skulle hända med inflationen i Sverige om arbetslösheten sjunker eller hur värdet på mitt pensionssparande förändras när Stockholmsbörsen rasar. Tillämpningar som dessa och många andra gör statistiska modeller viktiga för många delar av samhället. Ett sätt att ta fram statistiska modeller bygger på att kontinuerligt uppdatera en modell allteftersom mer information samlas in. Detta angreppssätt kallas för Bayesiansk statistik och är särskilt användbart när man sedan tidigare har bra insikter i modellen eller tillgång till endast lite historisk data för att bygga modellen. En nackdel med Bayesiansk statistik är att de beräkningar som krävs för att uppdatera modellen med den nya informationen ofta är mycket komplicerade. I sådana situationer kan man istället simulera utfallet från miljontals varianter av modellen och sedan jämföra dessa mot de historiska observationerna som finns till hands. Man kan sedan medelvärdesbilda över de varianter som gav bäst resultat för att på så sätt ta fram en slutlig modell. Det kan därför ibland ta dagar eller veckor för att ta fram en modell. Problemet blir särskilt stort när man använder mer avancerade modeller som skulle kunna ge bättre prognoser men som tar för lång tid för att bygga. I denna avhandling använder vi ett antal olika strategier för att underlätta eller förbättra dessa simuleringar. Vi föreslår exempelvis att ta hänsyn till fler insikter om systemet och därmed minska antalet varianter av modellen som behöver undersökas. Vi kan således redan utesluta vissa modeller eftersom vi har en bra uppfattning om ungefär hur en bra modell ska se ut. Vi kan också förändra simuleringen så att den enklare rör sig mellan olika typer av modeller. På detta sätt utforskas rymden av alla möjliga modeller på ett mer effektivt sätt. Vi föreslår ett antal olika kombinationer och förändringar av befintliga metoder för att snabba upp anpassningen av modellen till observationerna. Vi visar att beräkningstiden i vissa fall kan minska ifrån några dagar till någon timme. Förhoppningsvis kommer detta i framtiden leda till att man i praktiken kan använda mer avancerade modeller som i sin tur resulterar i bättre prognoser och beslut.




Machine learning using approximate inference


Book Description

Automatic decision making and pattern recognition under uncertainty are difficult tasks that are ubiquitous in our everyday life. The systems we design, and technology we develop, requires us to coherently represent and work with uncertainty in data. Probabilistic models and probabilistic inference gives us a powerful framework for solving this problem. Using this framework, while enticing, results in difficult-to-compute integrals and probabilities when conditioning on the observed data. This means we have a need for approximate inference, methods that solves the problem approximately using a systematic approach. In this thesis we develop new methods for efficient approximate inference in probabilistic models. There are generally two approaches to approximate inference, variational methods and Monte Carlo methods. In Monte Carlo methods we use a large number of random samples to approximate the integral of interest. With variational methods, on the other hand, we turn the integration problem into that of an optimization problem. We develop algorithms of both types and bridge the gap between them. First, we present a self-contained tutorial to the popular sequential Monte Carlo (SMC) class of methods. Next, we propose new algorithms and applications based on SMC for approximate inference in probabilistic graphical models. We derive nested sequential Monte Carlo, a new algorithm particularly well suited for inference in a large class of high-dimensional probabilistic models. Then, inspired by similar ideas we derive interacting particle Markov chain Monte Carlo to make use of parallelization to speed up approximate inference for universal probabilistic programming languages. After that, we show how we can make use of the rejection sampling process when generating gamma distributed random variables to speed up variational inference. Finally, we bridge the gap between SMC and variational methods by developing variational sequential Monte Carlo, a new flexible family of variational approximations.




Controllability of Complex Networks at Minimum Cost


Book Description

The control-theoretic notion of controllability captures the ability to guide a system toward a desired state with a suitable choice of inputs. Controllability of complex networks such as traffic networks, gene regulatory networks, power grids etc. can for instance enable efficient operation or entirely new applicative possibilities. However, when control theory is applied to complex networks like these, several challenges arise. This thesis considers some of them, in particular we investigate how a given network can be rendered controllable at a minimum cost by placement of control inputs or by growing the network with additional edges between its nodes. As cost function we take either the number of control inputs that are needed or the energy that they must exert. A control input is called unilateral if it can assume either positive or negative values, but not both. Motivated by the many applications where unilateral controls are common, we reformulate classical controllability results for this particular case into a more computationally-efficient form that enables a large scale analysis. Assuming that each control input targets only one node (called a driver node), we show that the unilateral controllability problem is to a high degree structural: from topological properties of the network we derive theoretical lower bounds for the minimal number of unilateral control inputs, bounds similar to those that have already been established for the minimal number of unconstrained control inputs (e.g. can assume both positive and negative values). With a constructive algorithm for unilateral control input placement we also show that the theoretical bounds can often be achieved. A network may be controllable in theory but not in practice if for instance unreasonable amounts of control energy are required to steer it in some direction. For the case with unconstrained control inputs, we show that the control energy depends on the time constants of the modes of the network, the longer they are, the less energy is required for control. We also present different strategies for the problem of placing driver nodes such that the control energy requirements are reduced (assuming that theoretical controllability is not an issue). For the most general class of networks we consider, directed networks with arbitrary eigenvalues (and thereby arbitrary time constants), we suggest strategies based on a novel characterization of network non-normality as imbalance in the distribution of energy over the network. Our formulation allows to quantify network non-normality at a node level as combination of two different centrality metrics. The first measure quantifies the influence that each node has on the rest of the network, while the second measure instead describes the ability to control a node indirectly from the other nodes. Selecting the nodes that maximize the network non-normality as driver nodes significantly reduces the energy needed for control. Growing a network, i.e. adding more edges to it, is a promising alternative to reduce the energy needed to control it. We approach this by deriving a sensitivity function that enables to quantify the impact of an edge modification with the H2 and H? norms, which in turn can be used to design edge additions that improve commonly used control energy metrics.




Sensor Management for Target Tracking Applications


Book Description

Many practical applications, such as search and rescue operations and environmental monitoring, involve the use of mobile sensor platforms. The workload of the sensor operators is becoming overwhelming, as both the number of sensors and their complexity are increasing. This thesis addresses the problem of automating sensor systems to support the operators. This is often referred to as sensor management. By planning trajectories for the sensor platforms and exploiting sensor characteristics, the accuracy of the resulting state estimates can be improved. The considered sensor management problems are formulated in the framework of stochastic optimal control, where prior knowledge, sensor models, and environment models can be incorporated. The core challenge lies in making decisions based on the predicted utility of future measurements. In the special case of linear Gaussian measurement and motion models, the estimation performance is independent of the actual measurements. This reduces the problem of computing sensing trajectories to a deterministic optimal control problem, for which standard numerical optimization techniques can be applied. A theorem is formulated that makes it possible to reformulate a class of nonconvex optimization problems with matrix-valued variables as convex optimization problems. This theorem is then used to prove that globally optimal sensing trajectories can be computed using off-the-shelf optimization tools. As in many other fields, nonlinearities make sensor management problems more complicated. Two approaches are derived to handle the randomness inherent in the nonlinear problem of tracking a maneuvering target using a mobile range-bearing sensor with limited field of view. The first approach uses deterministic sampling to predict several candidates of future target trajectories that are taken into account when planning the sensing trajectory. This significantly increases the tracking performance compared to a conventional approach that neglects the uncertainty in the future target trajectory. The second approach is a method to find the optimal range between the sensor and the target. Given the size of the sensor's field of view and an assumption of the maximum acceleration of the target, the optimal range is determined as the one that minimizes the tracking error while satisfying a user-defined constraint on the probability of losing track of the target. While optimization for tracking of a single target may be difficult, planning for jointly maintaining track of discovered targets and searching for yet undetected targets is even more challenging. Conventional approaches are typically based on a traditional tracking method with separate handling of undetected targets. Here, it is shown that the Poisson multi-Bernoulli mixture (PMBM) filter provides a theoretical foundation for a unified search and track method, as it not only provides state estimates of discovered targets, but also maintains an explicit representation of where undetected targets may be located. Furthermore, in an effort to decrease the computational complexity, a version of the PMBM filter which uses a grid-based intensity to represent undetected targets is derived.




Gaussian Processes for Positioning Using Radio Signal Strength Measurements


Book Description

Estimation of unknown parameters is considered as one of the major research areas in statistical signal processing. In the most recent decades, approaches in estimation theory have become more and more attractive in practical applications. Examples of such applications may include, but are not limited to, positioning using various measurable radio signals in indoor environments, self-navigation for autonomous cars, image processing, radar tracking and so on. One issue that is usually encountered when solving an estimation problem is to identify a good system model, which may have great impacts on the estimation performance. In this thesis, we are interested in studying estimation problems particularly in inferring the unknown positions from noisy radio signal measurements. In addition, the modeling of the system is studied by investigating the relationship between positions and radio signal strength measurements. One of the main contributions of this thesis is to propose a novel indoor positioning framework based on proximity measurements, which are obtained by quantizing the received signal strength measurements. Sequential Monte Carlo methods, to be more specific particle filter and smoother, are utilized for estimating unknown positions from proximity measurements. The Cramér-Rao bounds for proximity-based positioning are further derived as a benchmark for the positioning accuracy in this framework. Secondly, to improve the estimation performance, Bayesian non-parametric modeling, namely Gaussian processes, have been adopted to provide more accurate and flexible models for both dynamic motions and radio signal strength measurements. Then, the Cramér-Rao bounds for Gaussian process based system models are derived and evaluated in an indoor positioning scenario. In addition, we estimate the positions of stationary devices by comparing the individual signal strength measurements with a pre-constructed fingerprinting database. The positioning accuracy is further compared to the case where a moving device is positioned using a time series of radio signal strength measurements. Moreover, Gaussian processes have been applied to sports analytics, where trajectory modeling for athletes is studied. The proposed framework can be further utilized to carry out, for instance, performance prediction and analysis, health condition monitoring, etc. Finally, a grey-box modeling is proposed to analyze the forces, particularly in cross-country skiing races, by combining a deterministic kinetic model with Gaussian process.




Flight Test System Identification


Book Description

With the demand for more advanced fighter aircraft, relying on unstable flight mechanical characteristics to gain flight performance, more focus has been put on model-based system engineering to help with the design work. The flight control system design is one important part that relies on this modeling. Therefore, it has become more important to develop flight mechanical models that are highly accurate in the whole flight envelope. For today’s modern fighter aircraft, the basic flight mechanical characteristics change between linear and nonlinear as well as stable and unstable as an effect of the desired capability of advanced maneuvering at subsonic, transonic and supersonic speeds. This thesis combines the subject of system identification, which is the art of building mathematical models of dynamical systems based on measurements, with aeronautical engineering in order to find methods for identifying flight mechanical characteristics. Here, some challenging aeronautical identification problems, estimating model parameters from flight-testing, are treated. Two aspects are considered. The first is online identification during flight-testing with the intent to aid the engineers in the analysis process when looking at the flight mechanical characteristics. This will also ensure that enough information is available in the resulting test data for post-flight analysis. Here, a frequency domain method is used. An existing method has been developed further by including an Instrumental Variable approach to take care of noisy data including atmospheric turbulence and by a sensor-fusion step to handle varying excitation during an experiment. The method treats linear systems that can be both stable and unstable working under feedback control. An experiment has been performed on a radio-controlled demonstrator aircraft. For this, multisine input signals have been designed and the results show that it is possible to perform more time-efficient flight-testing compared with standard input signals. The other aspect is post-flight identification of nonlinear characteristics. Here the properties of a parameterized observer approach, using a prediction-error method, are investigated. This approach is compared with four other methods for some test cases. It is shown that this parameterized observer approach is the most robust one with respect to noise disturbances and initial offsets. Another attractive property is that no user parameters have to be tuned by the engineers in order to get the best performance. All methods in this thesis have been validated on simulated data where the system is known, and have also been tested on real flight test data. Both of the investigated approaches show promising results.




Time of Flight Estimation for Radio Network Positioning


Book Description

Trilateration is the mathematical theory of computing the intersection of circles. These circles may be obtained by time of flight (ToF) measurements in radio systems, as well as laser, radar and sonar systems. A first purpose of this thesis is to survey recent efforts in the area and their potential for localization. The rest of the thesis then concerns selected problems in new cellular radio standards as well as fundamental challenges caused by propagation delays in the ToF measurements, which cannot travel faster than the speed of light. We denote the measurement uncertainty stemming from propagation delays for positive noise, and develop a general theory with optimal estimators for selected distributions, which can be applied to trilateration but also a much wider class of estimation problems. The first contribution concerns a narrow-band mode in the long-term evolution (LTE) standard intended for internet of things (IoT) devices. This LTE standard includes a special position reference signal sent synchronized by all base stations (BS) to all IoT devices. Each device can then compute several pair-wise time differences that correspond to hyperbolic functions. The simulation-based performance evaluation indicates that decent position accuracy can be achieved despite the narrow bandwidth of the channel. The second contribution is a study of how timing measurements in LTE can be combined. Round trip time (RTT) to the serving BS and time difference of arrival (TDOA) to the neighboring BS are used as measurements. We propose a filtering framework to deal with the existing uncertainty in the solution and evaluate with both simulated and experimental test data. The results indicate that the position accuracy is better than 40 meters 95% of the time. The third contribution is a comprehensive theory of how to estimate the signal observed in positive noise, that is, random variables with positive support. It is well known from the literature that order statistics give one order of magnitude lower estimation variance compared to the best linear unbiased estimator (BLUE). We provide a systematic survey of some common distributions with positive support, and provide derivations and summaries of estimators based on order statistics, including the BLUE one for comparison. An iterative global navigation satellite system (GNSS) localization algorithm, based on the derived estimators, is introduced to jointly estimate the receiver’s position and clock bias. The fourth contribution is an extension of the third contribution to a particular approach to utilize positive noise in nonlinear models. That is, order statistics have been employed to derive estimators for a generic nonlinear model with positive noise. The proposed method further enables the estimation of the hyperparameters of the underlying noise distribution. The performance of the proposed estimator is then compared with the maximum likelihood estimator when the underlying noise follows either a uniform or exponential distribution.




Inverse system identification with applications in predistortion


Book Description

Models are commonly used to simulate events and processes, and can be constructed from measured data using system identification. The common way is to model the system from input to output, but in this thesis we want to obtain the inverse of the system. Power amplifiers (PAs) used in communication devices can be nonlinear, and this causes interference in adjacent transmitting channels. A prefilter, called predistorter, can be used to invert the effects of the PA, such that the combination of predistorter and PA reconstructs an amplified version of the input signal. In this thesis, the predistortion problem has been investigated for outphasing power amplifiers, where the input signal is decomposed into two branches that are amplified separately by highly efficient nonlinear amplifiers and then recombined. We have formulated a model structure describing the imperfections in an outphasing abbrPA and the matching ideal predistorter. The predistorter can be estimated from measured data in different ways. Here, the initially nonconvex optimization problem has been developed into a convex problem. The predistorters have been evaluated in measurements. The goal with the inverse models in this thesis is to use them in cascade with the systems to reconstruct the original input. It is shown that the problems of identifying a model of a preinverse and a postinverse are fundamentally different. It turns out that the true inverse is not necessarily the best one when noise is present, and that other models and structures can lead to better inversion results. To construct a predistorter (for a PA, for example), a model of the inverse is used, and different methods can be used for the estimation. One common method is to estimate a postinverse, and then using it as a preinverse, making it straightforward to try out different model structures. Another is to construct a model of the system and then use it to estimate a preinverse in a second step. This method identifies the inverse in the setup it will be used, but leads to a complicated optimization problem. A third option is to model the forward system and then invert it. This method can be understood using standard identification theory in contrast to the ones above, but the model is tuned for the forward system, not the inverse. Models obtained using the various methods capture different properties of the system, and a more detailed analysis of the methods is presented for linear time-invariant systems and linear approximations of block-oriented systems. The theory is also illustrated in examples. When a preinverse is used, the input to the system will be changed, and typically the input data will be different than the original input. This is why the estimation of preinverses is more complicated than for postinverses, and one set of experimental data is not enough. Here, we have shown that identifying a preinverse in series with the system in repeated experiments can improve the inversion performance.




Exploiting Direct Optimal Control for Motion Planning in Unstructured Environments


Book Description

During the last decades, motion planning for autonomous systems has become an important area of research. The high interest is not the least due to the development of systems such as self-driving cars, unmanned aerial vehicles and robotic manipulators. The objective in optimal motion planning problems is to find feasible motion plans that also optimize a performance measure. From a control perspective, the problem is an instance of an optimal control problem. This thesis addresses optimal motion planning problems for complex dynamical systems that operate in unstructured environments, where no prior reference such as road-lane information is available. Some example scenarios are autonomous docking of vessels in harbors and autonomous parking of self-driving tractor-trailer vehicles at loading sites. The focus is to develop optimal motion planning algorithms that can reliably be applied to these types of problems. This is achieved by combining recent ideas from automatic control, numerical optimization and robotics. The first contribution is a systematic approach for computing local solutions to motion planning problems in challenging unstructured environments. The solutions are computed by combining homotopy methods and direct optimal control techniques. The general principle is to define a homotopy that transforms, or preferably relaxes, the original problem to an easily solved problem. The approach is demonstrated in motion planning problems in 2D and 3D environments, where the presented method outperforms a state-of-the-art asymptotically optimal motion planner based on random sampling. The second contribution is an optimization-based framework for automatic generation of motion primitives for lattice-based motion planners. Given a family of systems, the user only needs to specify which principle types of motions that are relevant for the considered system family. Based on the selected principle motions and a selected system instance, the framework computes a library of motion primitives by simultaneously optimizing the motions and the terminal states. The final contribution of this thesis is a motion planning framework that combines the strengths of sampling-based planners with direct optimal control in a novel way. The sampling-based planner is applied to the problem in a first step using a discretized search space, where the system dynamics and objective function are chosen to coincide with those used in a second step based on optimal control. This combination ensures that the sampling-based motion planner provides a feasible motion plan which is highly suitable as warm-start to the optimal control step. Furthermore, the second step is modified such that it also can be applied in a receding-horizon fashion, where the proposed combination of methods is used to provide theoretical guarantees in terms of recursive feasibility, worst-case objective function value and convergence to the terminal state. The proposed motion planning framework is successfully applied to several problems in challenging unstructured environments for tractor-trailer vehicles. The framework is also applied and tailored for maritime navigation for vessels in archipelagos and harbors, where it is able to compute energy-efficient trajectories which complies with the international regulations for preventing collisions at sea.




Structure-Exploiting Numerical Algorithms for Optimal Control


Book Description

Numerical algorithms for efficiently solving optimal control problems are important for commonly used advanced control strategies, such as model predictive control (MPC), but can also be useful for advanced estimation techniques, such as moving horizon estimation (MHE). In MPC, the control input is computed by solving a constrained finite-time optimal control (CFTOC) problem on-line, and in MHE the estimated states are obtained by solving an optimization problem that often can be formulated as a CFTOC problem. Common types of optimization methods for solving CFTOC problems are interior-point (IP) methods, sequential quadratic programming (SQP) methods and active-set (AS) methods. In these types of methods, the main computational effort is often the computation of the second-order search directions. This boils down to solving a sequence of systems of equations that correspond to unconstrained finite-time optimal control (UFTOC) problems. Hence, high-performing second-order methods for CFTOC problems rely on efficient numerical algorithms for solving UFTOC problems. Developing such algorithms is one of the main focuses in this thesis. When the solution to a CFTOC problem is computed using an AS type method, the aforementioned system of equations is only changed by a low-rank modification between two AS iterations. In this thesis, it is shown how to exploit these structured modifications while still exploiting structure in the UFTOC problem using the Riccati recursion. Furthermore, direct (non-iterative) parallel algorithms for computing the search directions in IP, SQP and AS methods are proposed in the thesis. These algorithms exploit, and retain, the sparse structure of the UFTOC problem such that no dense system of equations needs to be solved serially as in many other algorithms. The proposed algorithms can be applied recursively to obtain logarithmic computational complexity growth in the prediction horizon length. For the case with linear MPC problems, an alternative approach to solving the CFTOC problem on-line is to use multiparametric quadratic programming (mp-QP), where the corresponding CFTOC problem can be solved explicitly off-line. This is referred to as explicit MPC. One of the main limitations with mp-QP is the amount of memory that is required to store the parametric solution. In this thesis, an algorithm for decreasing the required amount of memory is proposed. The aim is to make mp-QP and explicit MPC more useful in practical applications, such as embedded systems with limited memory resources. The proposed algorithm exploits the structure from the QP problem in the parametric solution in order to reduce the memory footprint of general mp-QP solutions, and in particular, of explicit MPC solutions. The algorithm can be used directly in mp-QP solvers, or as a post-processing step to an existing solution.