Learning on Private Data with Homomorphic Encryption and Differential Privacy


Book Description

Today, the growing concern of privacy issues poses a challenge to the study of sensitive data. In this thesis, we address the learning of private data in two practical scenarios. 1) It is very commonly seen that the same type of data are distributed among multiple parties, and each party has a local portion of the data. For these parties, the learning based only on their own portions of data may lead to small sample problem and generate unsatisfying results. On the other hand, privacy concerns prevent them from exchanging their data and subsequently learning global results from the union of data. In this scenario, we solve the problem with the homomorphic encryption model. Homomorphic encryption enables calculations in the cipher space, which means that some particular operations of data can be conducted even when the data are encrypted. With this technique, we design the privacy preserving solutions for four popular data analysis methods on distributed data, including the Marginal Fisher Analysis (MFA) for dimensionality reduction and classification, the Kruskal-Wallis (KW) statistical test for comparing the distributions of samples, the Markov model for sequence classification and the calculation of Fisher criterion score for informative gene selection. Our solutions allow different parties to perform the algorithms on the union of their data without revealing each party's private information. 2) The other scenario is that, the data holder wants to release some knowledge learned from the sensitive dataset without violating the privacy of individuals participated in the dataset. Although there is no need of direct data exchange in this scenario, publishing the knowledge learned from the data still exposes the participants' private information. Here we adopt the rigorous differential privacy model to protect the individuals' privacy. Specifically, if an algorithm is differentially private, the presence or absence of a data instance in the training dataset would not make much change to the output of the algorithm. In this way, from the released output of the algorithm people cannot gain much information about the individuals participated in the training dataset, and thus the individual privacy is protected. In this scenario, we develop differentially private One Class SVM (1-SVM) models for anomaly detection with theoretical proofs of the privacy and utility. The learned differentially private 1-SVM models can be released for others to perform anomaly detection without violating the privacy of individuals who participated in the training dataset.




How to Build Privacy and Security Into Deep Learning Models


Book Description

In recent years, we've seen tremendous improvements in artificial intelligence, due to the advances of neural-based models. However, the more popular these algorithms and techniques get, the more serious the consequences of data and user privacy. These issues will drastically impact the future of AI research-specifically how neural-based models are developed, deployed, and evaluated. Yishay Carmiel (IntelligentWire) shares techniques and explains how data privacy will impact machine learning development and how future training and inference will be affected. Yishay first dives into why training on private data should be addressed, federated learning, and differential privacy. He then discusses why inference on private data should be addressed, homomorphic encryption and neural networks, a polynomial approximation of neural networks, protecting data in neural networks, data reconstruction from neural networks, and methods and techniques to secure data reconstruction from neural networks. This session was recorded at the 2019 O'Reilly Artificial Intelligence Conference in New York.




Privacy-Preserving Machine Learning


Book Description

This book provides a thorough overview of the evolution of privacy-preserving machine learning schemes over the last ten years, after discussing the importance of privacy-preserving techniques. In response to the diversity of Internet services, data services based on machine learning are now available for various applications, including risk assessment and image recognition. In light of open access to datasets and not fully trusted environments, machine learning-based applications face enormous security and privacy risks. In turn, it presents studies conducted to address privacy issues and a series of proposed solutions for ensuring privacy protection in machine learning tasks involving multiple parties. In closing, the book reviews state-of-the-art privacy-preserving techniques and examines the security threats they face.




Handbook of Sharing Confidential Data


Book Description

Statistical agencies, research organizations, companies, and other data stewards that seek to share data with the public face a challenging dilemma. They need to protect the privacy and confidentiality of data subjects and their attributes while providing data products that are useful for their intended purposes. In an age when information on data subjects is available from a wide range of data sources, as are the computational resources to obtain that information, this challenge is increasingly difficult. The Handbook of Sharing Confidential Data helps data stewards understand how tools from the data confidentiality literature—specifically, synthetic data, formal privacy, and secure computation—can be used to manage trade-offs in disclosure risk and data usefulness. Key features: • Provides overviews of the potential and the limitations of synthetic data, differential privacy, and secure computation • Offers an accessible review of methods for implementing differential privacy, both from methodological and practical perspectives • Presents perspectives from both computer science and statistical science for addressing data confidentiality and privacy • Describes genuine applications of synthetic data, formal privacy, and secure computation to help practitioners implement these approaches The handbook is accessible to both researchers and practitioners who work with confidential data. It requires familiarity with basic concepts from probability and data analysis.




Privacy-Preserving Deep Learning


Book Description

This book discusses the state-of-the-art in privacy-preserving deep learning (PPDL), especially as a tool for machine learning as a service (MLaaS), which serves as an enabling technology by combining classical privacy-preserving and cryptographic protocols with deep learning. Google and Microsoft announced a major investment in PPDL in early 2019. This was followed by Google’s infamous announcement of “Private Join and Compute,” an open source PPDL tools based on secure multi-party computation (secure MPC) and homomorphic encryption (HE) in June of that year. One of the challenging issues concerning PPDL is selecting its practical applicability despite the gap between the theory and practice. In order to solve this problem, it has recently been proposed that in addition to classical privacy-preserving methods (HE, secure MPC, differential privacy, secure enclaves), new federated or split learning for PPDL should also be applied. This concept involves building a cloud framework that enables collaborative learning while keeping training data on client devices. This successfully preserves privacy and while allowing the framework to be implemented in the real world. This book provides fundamental insights into privacy-preserving and deep learning, offering a comprehensive overview of the state-of-the-art in PPDL methods. It discusses practical issues, and leveraging federated or split-learning-based PPDL. Covering the fundamental theory of PPDL, the pros and cons of current PPDL methods, and addressing the gap between theory and practice in the most recent approaches, it is a valuable reference resource for a general audience, undergraduate and graduate students, as well as practitioners interested learning about PPDL from the scratch, and researchers wanting to explore PPDL for their applications.




Protecting Privacy through Homomorphic Encryption


Book Description

This book summarizes recent inventions, provides guidelines and recommendations, and demonstrates many practical applications of homomorphic encryption. This collection of papers represents the combined wisdom of the community of leading experts on Homomorphic Encryption. In the past 3 years, a global community consisting of researchers in academia, industry, and government, has been working closely to standardize homomorphic encryption. This is the first publication of whitepapers created by these experts that comprehensively describes the scientific inventions, presents a concrete security analysis, and broadly discusses applicable use scenarios and markets. This book also features a collection of privacy-preserving machine learning applications powered by homomorphic encryption designed by groups of top graduate students worldwide at the Private AI Bootcamp hosted by Microsoft Research. The volume aims to connect non-expert readers with this important new cryptographic technology in an accessible and actionable way. Readers who have heard good things about homomorphic encryption but are not familiar with the details will find this book full of inspiration. Readers who have preconceived biases based on out-of-date knowledge will see the recent progress made by industrial and academic pioneers on optimizing and standardizing this technology. A clear picture of how homomorphic encryption works, how to use it to solve real-world problems, and how to efficiently strengthen privacy protection, will naturally become clear.




Grokking Deep Learning


Book Description

Summary Grokking Deep Learning teaches you to build deep learning neural networks from scratch! In his engaging style, seasoned deep learning expert Andrew Trask shows you the science under the hood, so you grok for yourself every detail of training neural networks. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Deep learning, a branch of artificial intelligence, teaches computers to learn by using neural networks, technology inspired by the human brain. Online text translation, self-driving cars, personalized product recommendations, and virtual voice assistants are just a few of the exciting modern advancements possible thanks to deep learning. About the Book Grokking Deep Learning teaches you to build deep learning neural networks from scratch! In his engaging style, seasoned deep learning expert Andrew Trask shows you the science under the hood, so you grok for yourself every detail of training neural networks. Using only Python and its math-supporting library, NumPy, you'll train your own neural networks to see and understand images, translate text into different languages, and even write like Shakespeare! When you're done, you'll be fully prepared to move on to mastering deep learning frameworks. What's inside The science behind deep learning Building and training your own neural networks Privacy concepts, including federated learning Tips for continuing your pursuit of deep learning About the Reader For readers with high school-level math and intermediate programming skills. About the Author Andrew Trask is a PhD student at Oxford University and a research scientist at DeepMind. Previously, Andrew was a researcher and analytics product manager at Digital Reasoning, where he trained the world's largest artificial neural network and helped guide the analytics roadmap for the Synthesys cognitive computing platform. Table of Contents Introducing deep learning: why you should learn it Fundamental concepts: how do machines learn? Introduction to neural prediction: forward propagation Introduction to neural learning: gradient descent Learning multiple weights at a time: generalizing gradient descent Building your first deep neural network: introduction to backpropagation How to picture neural networks: in your head and on paper Learning signal and ignoring noise:introduction to regularization and batching Modeling probabilities and nonlinearities: activation functions Neural learning about edges and corners: intro to convolutional neural networks Neural networks that understand language: king - man + woman == ? Neural networks that write like Shakespeare: recurrent layers for variable-length data Introducing automatic optimization: let's build a deep learning framework Learning to write like Shakespeare: long short-term memory Deep learning on unseen data: introducing federated learning Where to go from here: a brief guide




Practical Data Privacy


Book Description

Between major privacy regulations like the GDPR and CCPA and expensive and notorious data breaches, there has never been so much pressure to ensure data privacy. Unfortunately, integrating privacy into data systems is still complicated. This essential guide will give you a fundamental understanding of modern privacy building blocks, like differential privacy, federated learning, and encrypted computation. Based on hard-won lessons, this book provides solid advice and best practices for integrating breakthrough privacy-enhancing technologies into production systems. Practical Data Privacy answers important questions such as: What do privacy regulations like GDPR and CCPA mean for my data workflows and data science use cases? What does "anonymized data" really mean? How do I actually anonymize data? How does federated learning and analysis work? Homomorphic encryption sounds great, but is it ready for use? How do I compare and choose the best privacy-preserving technologies and methods? Are there open-source libraries that can help? How do I ensure that my data science projects are secure by default and private by design? How do I work with governance and infosec teams to implement internal policies appropriately?




Sustainable Development Using Private AI


Book Description

This book covers the fundamental concepts of private AI and its applications. It also covers fusion of Private AI with cutting-edge technologies like cloud computing, federated learning and computer vision. Security Models and Applications for Sustainable Development Using Private AI reviews various encryption algorithms used for providing security in private AI. It discusses the role of training machine learning and Deep learning technologies in private AI. The book provides case studies of using private AI in various application areas such as purchasing, education, entertainment, medical diagnosis, predictive care, conversational personal assistants, wellness apps, early disease detection, and recommendation systems. The authors provide additional knowledge to handling the customer’s data securely and efficiently. It also provides multi-model dataset storage approaches along with the traditional approaches like anonymization of data and differential privacy mechanisms. The target audience includes undergraduate and postgraduate students in Computer Science, Information technology, Electronics and Communication Engineering and related disciplines. This book is also a one stop reference point for professionals, security researchers, scholars, various government agencies and security practitioners, and experts working in the cybersecurity Industry specifically in the R & D division.




Federated Learning


Book Description

This book provides a comprehensive and self-contained introduction to federated learning, ranging from the basic knowledge and theories to various key applications. Privacy and incentive issues are the focus of this book. It is timely as federated learning is becoming popular after the release of the General Data Protection Regulation (GDPR). Since federated learning aims to enable a machine model to be collaboratively trained without each party exposing private data to others. This setting adheres to regulatory requirements of data privacy protection such as GDPR. This book contains three main parts. Firstly, it introduces different privacy-preserving methods for protecting a federated learning model against different types of attacks such as data leakage and/or data poisoning. Secondly, the book presents incentive mechanisms which aim to encourage individuals to participate in the federated learning ecosystems. Last but not least, this book also describes how federated learning can be applied in industry and business to address data silo and privacy-preserving problems. The book is intended for readers from both the academia and the industry, who would like to learn about federated learning, practice its implementation, and apply it in their own business. Readers are expected to have some basic understanding of linear algebra, calculus, and neural network. Additionally, domain knowledge in FinTech and marketing would be helpful.”