Scalable Bayesian spatial analysis with Gaussian Markov random fields


Book Description

Accurate statistical analysis of spatial data is important in many applications. Failing to properly account for spatial autocorrelation may often lead to false conclusions. At the same time, the ever-increasing sizes of spatial datasets pose a great computational challenge, as many standard methods for spatial analysis are limited to a few thousand data points. In this thesis, we explore how Gaussian Markov random fields (GMRFs) can be used for scalable analysis of spatial data. GMRFs are closely connected to the commonly used Gaussian processes, but have sparsity properties that make them computationally cheap both in time and memory. The Bayesian framework enables a GMRF to be used as a spatial prior, comprising the assumption of smooth variation over space, and gives a principled way to estimate the parameters and propagate uncertainty. We develop new algorithms that enable applying GMRF priors in 3D to the brain activity inherent in functional magnetic resonance imaging (fMRI) data, with millions of observations. We show that our methods are both faster and more accurate than previous work. A method for approximating selected elements of the inverse precision matrix (i.e. the covariance matrix) is also proposed, which is important for evaluating the posterior uncertainty. In addition, we establish a link between GMRFs and deep convolutional neural networks, which have been successfully used in countless machine learning tasks for images, resulting in a deep GMRF model. Finally, we show how GMRFs can be used in real-time robotic search and rescue operations, for modeling the spatial distribution of injured persons. Tillförlitlig statistisk analys av spatiala data är viktigt inom många tillämpningar. Om inte korrekt hänsyn tas till spatial autokorrelation kan det ofta leda till felaktiga slutsatser. Samtidigt ökar ständigt storleken på de spatiala datamaterialen vilket utgör en stor beräkningsmässig utmaning, eftersom många standardmetoder för spatial analys är begränsade till några tusental datapunkter. I denna avhandling utforskar vi hur Gaussiska Markov-fält (eng: Gaussian Markov random fields, GMRF) kan användas för mer skalbara analyser av spatiala data. GMRF-modeller är nära besläktade med de ofta använda Gaussiska processerna, men har gleshetsegenskaper som gör dem beräkningsmässigt effektiva både vad gäller tids- och minnesåtgång. Det Bayesianska synsättet gör det möjligt att använda GMRF som en spatial prior som innefattar antagandet om långsam spatial variation och ger ett principiellt tillvägagångssätt för att skatta parametrar och propagera osäkerhet. Vi utvecklar nya algoritmer som gör det möjligt att använda GMRF-priors i 3D för den hjärnaktivitet som indirekt kan observeras i hjärnbilder framtagna med tekniken fMRI, som innehåller milliontals datapunkter. Vi visar att våra metoder är både snabbare och mer korrekta än tidigare forskning. En metod för att approximera utvalda element i den inversa precisionsmatrisen (dvs. kovariansmatrisen) framförs också, vilket är viktigt för att kunna evaluera osäkerheten i posteriorn. Vidare gör vi en koppling mellan GMRF och djupa neurala faltningsnätverk, som har använts framgångsrikt för mängder av bildrelaterade problem inom maskininlärning, vilket mynnar ut i en djup GMRF-modell. Slutligen visar vi hur GMRF kan användas i realtid av autonoma drönare för räddningsinsatser i katastrofområden för att modellera den spatiala fördelningen av skadade personer.




Beyond Recognition


Book Description

This thesis addresses the need to balance the use of facial recognition systems with the need to protect personal privacy in machine learning and biometric identification. As advances in deep learning accelerate their evolution, facial recognition systems enhance security capabilities, but also risk invading personal privacy. Our research identifies and addresses critical vulnerabilities inherent in facial recognition systems, and proposes innovative privacy-enhancing technologies that anonymize facial data while maintaining its utility for legitimate applications. Our investigation centers on the development of methodologies and frameworks that achieve k-anonymity in facial datasets; leverage identity disentanglement to facilitate anonymization; exploit the vulnerabilities of facial recognition systems to underscore their limitations; and implement practical defenses against unauthorized recognition systems. We introduce novel contributions such as AnonFACES, StyleID, IdDecoder, StyleAdv, and DiffPrivate, each designed to protect facial privacy through advanced adversarial machine learning techniques and generative models. These solutions not only demonstrate the feasibility of protecting facial privacy in an increasingly surveilled world, but also highlight the ongoing need for robust countermeasures against the ever-evolving capabilities of facial recognition technology. Continuous innovation in privacy-enhancing technologies is required to safeguard individuals from the pervasive reach of digital surveillance and protect their fundamental right to privacy. By providing open-source, publicly available tools, and frameworks, this thesis contributes to the collective effort to ensure that advancements in facial recognition serve the public good without compromising individual rights. Our multi-disciplinary approach bridges the gap between biometric systems, adversarial machine learning, and generative modeling to pave the way for future research in the domain and support AI innovation where technological advancement and privacy are balanced.




Parameterized Verification of Synchronized Concurrent Programs


Book Description

There is currently an increasing demand for concurrent programs. Checking the correctness of concurrent programs is a complex task due to the interleavings of processes. Sometimes, violation of the correctness properties in such systems causes human or resource losses; therefore, it is crucial to check the correctness of such systems. Two main approaches to software analysis are testing and formal verification. Testing can help discover many bugs at a low cost. However, it cannot prove the correctness of a program. Formal verification, on the other hand, is the approach for proving program correctness. Model checking is a formal verification technique that is suitable for concurrent programs. It aims to automatically establish the correctness (expressed in terms of temporal properties) of a program through an exhaustive search of the behavior of the system. Model checking was initially introduced for the purpose of verifying finite‐state concurrent programs, and extending it to infinite‐state systems is an active research area. In this thesis, we focus on the formal verification of parameterized systems. That is, systems in which the number of executing processes is not bounded a priori. We provide fully-automatic and parameterized model checking techniques for establishing the correctness of safety properties for certain classes of concurrent programs. We provide an open‐source prototype for every technique and present our experimental results on several benchmarks. First, we address the problem of automatically checking safety properties for bounded as well as parameterized phaser programs. Phaser programs are concurrent programs that make use of the complex synchronization construct of Habanero Java phasers. For the bounded case, we establish the decidability of checking the violation of program assertions and the undecidability of checking deadlock‐freedom. For the parameterized case, we study different formulations of the verification problem and propose an exact procedure that is guaranteed to terminate for some reachability problems even in the presence of unbounded phases and arbitrarily many spawned processes. Second, we propose an approach for automatic verification of parameterized concurrent programs in which shared variables are manipulated by atomic transitions to count and synchronize the spawned processes. For this purpose, we introduce counting predicates that related counters that refer to the number of processes satisfying some given properties to the variables that are directly manipulated by the concurrent processes. We then combine existing works on the counter, predicate, and constrained monotonic abstraction and build a nested counterexample‐based refinement scheme to establish correctness. Third, we introduce Lazy Constrained Monotonic Abstraction for more efficient exploration of well‐structured abstractions of infinite‐state non‐monotonic systems. We propose several heuristics and assess the efficiency of the proposed technique by extensive experiments using our open‐source prototype. Lastly, we propose a sound but (in general) incomplete procedure for automatic verification of safety properties for a class of fault‐tolerant distributed protocols described in the Heard‐Of (HO for short) model. The HO model is a popular model for describing distributed protocols. We propose a verification procedure that is guaranteed to terminate even for unbounded number of the processes that execute the distributed protocol.




Modeling Spatio-Temporal Data


Book Description

Several important topics in spatial and spatio-temporal statistics developed in the last 15 years have not received enough attention in textbooks. Modeling Spatio-Temporal Data: Markov Random Fields, Objectives Bayes, and Multiscale Models aims to fill this gap by providing an overview of a variety of recently proposed approaches for the analysis of spatial and spatio-temporal datasets, including proper Gaussian Markov random fields, dynamic multiscale spatio-temporal models, and objective priors for spatial and spatio-temporal models. The goal is to make these approaches more accessible to practitioners, and to stimulate additional research in these important areas of spatial and spatio-temporal statistics. Key topics: Proper Gaussian Markov random fields and their uses as building blocks for spatio-temporal models and multiscale models. Hierarchical models with intrinsic conditional autoregressive priors for spatial random effects, including reference priors, results on fast computations, and objective Bayes model selection. Objective priors for state-space models and a new approximate reference prior for a spatio-temporal model with dynamic spatio-temporal random effects. Spatio-temporal models based on proper Gaussian Markov random fields for Poisson observations. Dynamic multiscale spatio-temporal thresholding for spatial clustering and data compression. Multiscale spatio-temporal assimilation of computer model output and monitoring station data. Dynamic multiscale heteroscedastic multivariate spatio-temporal models. The M-open multiple optima paradox and some of its practical implications for multiscale modeling. Ensembles of dynamic multiscale spatio-temporal models for smooth spatio-temporal processes. The audience for this book are practitioners, researchers, and graduate students in statistics, data science, machine learning, and related fields. Prerequisites for this book are master's-level courses on statistical inference, linear models, and Bayesian statistics. This book can be used as a textbook for a special topics course on spatial and spatio-temporal statistics, as well as supplementary material for graduate courses on spatial and spatio-temporal modeling.




Introduction to Bayesian Methods in Ecology and Natural Resources


Book Description

This book presents modern Bayesian analysis in a format that is accessible to researchers in the fields of ecology, wildlife biology, and natural resource management. Bayesian analysis has undergone a remarkable transformation since the early 1990s. Widespread adoption of Markov chain Monte Carlo techniques has made the Bayesian paradigm the viable alternative to classical statistical procedures for scientific inference. The Bayesian approach has a number of desirable qualities, three chief ones being: i) the mathematical procedure is always the same, allowing the analyst to concentrate on the scientific aspects of the problem; ii) historical information is readily used, when appropriate; and iii) hierarchical models are readily accommodated. This monograph contains numerous worked examples and the requisite computer programs. The latter are easily modified to meet new situations. A primer on probability distributions is also included because these form the basis of Bayesian inference. Researchers and graduate students in Ecology and Natural Resource Management will find this book a valuable reference.




Statistical Methods in Epilepsy


Book Description

Epilepsy research promises new treatments and insights into brain function, but statistics and machine learning are paramount for extracting meaning from data and enabling discovery. Statistical Methods in Epilepsy provides a comprehensive introduction to statistical methods used in epilepsy research. Written in a clear, accessible style by leading authorities, this textbook demystifies introductory and advanced statistical methods, providing a practical roadmap that will be invaluable for learners and experts alike. Topics include a primer on version control and coding, pre-processing of imaging and electrophysiological data, hypothesis testing, generalized linear models, survival analysis, network analysis, time-series analysis, spectral analysis, spatial statistics, unsupervised and supervised learning, natural language processing, prospective trial design, pharmacokinetic and pharmacodynamic modeling, and randomized clinical trials. Features: Provides a comprehensive introduction to statistical methods employed in epilepsy research Divided into four parts: Basic Processing Methods for Data Analysis; Statistical Models for Epilepsy Data Types; Machine Learning Methods; and Clinical Studies Covers methodological and practical aspects, as well as worked-out examples with R and Python code provided in the online supplement Includes contributions by experts in the field https://github.com/sharon-chiang/Statistics-Epilepsy-Book/ The handbook targets clinicians, graduate students, medical students, and researchers who seek to conduct quantitative epilepsy research. The topics covered extend broadly to quantitative research in other neurological specialties and provide a valuable reference for the field of neurology.




Bayesian Prediction and Adaptive Sampling Algorithms for Mobile Sensor Networks


Book Description

This brief introduces a class of problems and models for the prediction of the scalar field of interest from noisy observations collected by mobile sensor networks. It also introduces the problem of optimal coordination of robotic sensors to maximize the prediction quality subject to communication and mobility constraints either in a centralized or distributed manner. To solve such problems, fully Bayesian approaches are adopted, allowing various sources of uncertainties to be integrated into an inferential framework effectively capturing all aspects of variability involved. The fully Bayesian approach also allows the most appropriate values for additional model parameters to be selected automatically by data, and the optimal inference and prediction for the underlying scalar field to be achieved. In particular, spatio-temporal Gaussian process regression is formulated for robotic sensors to fuse multifactorial effects of observations, measurement noise, and prior distributions for obtaining the predictive distribution of a scalar environmental field of interest. New techniques are introduced to avoid computationally prohibitive Markov chain Monte Carlo methods for resource-constrained mobile sensors. Bayesian Prediction and Adaptive Sampling Algorithms for Mobile Sensor Networks starts with a simple spatio-temporal model and increases the level of model flexibility and uncertainty step by step, simultaneously solving increasingly complicated problems and coping with increasing complexity, until it ends with fully Bayesian approaches that take into account a broad spectrum of uncertainties in observations, model parameters, and constraints in mobile sensor networks. The book is timely, being very useful for many researchers in control, robotics, computer science and statistics trying to tackle a variety of tasks such as environmental monitoring and adaptive sampling, surveillance, exploration, and plume tracking which are of increasing currency. Problems are solved creatively by seamless combination of theories and concepts from Bayesian statistics, mobile sensor networks, optimal experiment design, and distributed computation.




Statistical Modeling Using Bayesian Latent Gaussian Models


Book Description

This book focuses on the statistical modeling of geophysical and environmental data using Bayesian latent Gaussian models. The structure of these models is described in a thorough introductory chapter, which explains how to construct prior densities for the model parameters, how to infer the parameters using Bayesian computation, and how to use the models to make predictions. The remaining six chapters focus on the application of Bayesian latent Gaussian models to real examples in glaciology, hydrology, engineering seismology, seismology, meteorology and climatology. These examples include: spatial predictions of surface mass balance; the estimation of Antarctica’s contribution to sea-level rise; the estimation of rating curves for the projection of water level to discharge; ground motion models for strong motion; spatial modeling of earthquake magnitudes; weather forecasting based on numerical model forecasts; and extreme value analysis of precipitation on a high-dimensional grid. The book is aimed at graduate students and experts in statistics, geophysics, environmental sciences, engineering, and related fields.




Gaussian Markov Random Fields


Book Description

Gaussian Markov Random Field (GMRF) models are most widely used in spatial statistics - a very active area of research in which few up-to-date reference works are available. This is the first book on the subject that provides a unified framework of GMRFs with particular emphasis on the computational aspects. This book includes extensive case-studie




Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA


Book Description

Modeling spatial and spatio-temporal continuous processes is an important and challenging problem in spatial statistics. Advanced Spatial Modeling with Stochastic Partial Differential Equations Using R and INLA describes in detail the stochastic partial differential equations (SPDE) approach for modeling continuous spatial processes with a Matérn covariance, which has been implemented using the integrated nested Laplace approximation (INLA) in the R-INLA package. Key concepts about modeling spatial processes and the SPDE approach are explained with examples using simulated data and real applications. This book has been authored by leading experts in spatial statistics, including the main developers of the INLA and SPDE methodologies and the R-INLA package. It also includes a wide range of applications: * Spatial and spatio-temporal models for continuous outcomes * Analysis of spatial and spatio-temporal point patterns * Coregionalization spatial and spatio-temporal models * Measurement error spatial models * Modeling preferential sampling * Spatial and spatio-temporal models with physical barriers * Survival analysis with spatial effects * Dynamic space-time regression * Spatial and spatio-temporal models for extremes * Hurdle models with spatial effects * Penalized Complexity priors for spatial models All the examples in the book are fully reproducible. Further information about this book, as well as the R code and datasets used, is available from the book website at http://www.r-inla.org/spde-book. The tools described in this book will be useful to researchers in many fields such as biostatistics, spatial statistics, environmental sciences, epidemiology, ecology and others. Graduate and Ph.D. students will also find this book and associated files a valuable resource to learn INLA and the SPDE approach for spatial modeling.