Tidy Finance with Python


Book Description

This textbook shows how to bring theoretical concepts from finance and econometrics to the data. Focusing on coding and data analysis with Python, we show how to conduct research in empirical finance from scratch. We start by introducing the concepts of tidy data and coding principles using pandas, numpy, and plotnine. Code is provided to prepare common open-source and proprietary financial data sources (CRSP, Compustat, Mergent FISD, TRACE) and organize them in a database. We reuse these data in all the subsequent chapters, which we keep as self-contained as possible. The empirical applications range from key concepts of empirical asset pricing (beta estimation, portfolio sorts, performance analysis, Fama-French factors) to modeling and machine learning applications (fixed effects estimation, clustering standard errors, difference-in-difference estimators, ridge regression, Lasso, Elastic net, random forests, neural networks) and portfolio optimization techniques. Key Features: Self-contained chapters on the most important applications and methodologies in finance, which can easily be used for the reader’s research or as a reference for courses on empirical finance. Each chapter is reproducible in the sense that the reader can replicate every single figure, table, or number by simply copying and pasting the code we provide. A full-fledged introduction to machine learning with scikit-learn based on tidy principles to show how factor selection and option pricing can benefit from Machine Learning methods. We show how to retrieve and prepare the most important datasets financial economics: CRSP and Compustat, including detailed explanations of the most relevant data characteristics. Each chapter provides exercises based on established lectures and classes which are designed to help students to dig deeper. The exercises can be used for self-studying or as a source of inspiration for teaching exercises.




Reproducible Finance with R


Book Description

Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis is a unique introduction to data science for investment management that explores the three major R/finance coding paradigms, emphasizes data visualization, and explains how to build a cohesive suite of functioning Shiny applications. The full source code, asset price data and live Shiny applications are available at reproduciblefinance.com. The ideal reader works in finance or wants to work in finance and has a desire to learn R code and Shiny through simple, yet practical real-world examples. The book begins with the first step in data science: importing and wrangling data, which in the investment context means importing asset prices, converting to returns, and constructing a portfolio. The next section covers risk and tackles descriptive statistics such as standard deviation, skewness, kurtosis, and their rolling histories. The third section focuses on portfolio theory, analyzing the Sharpe Ratio, CAPM, and Fama French models. The book concludes with applications for finding individual asset contribution to risk and for running Monte Carlo simulations. For each of these tasks, the three major coding paradigms are explored and the work is wrapped into interactive Shiny dashboards.




Text Mining with R


Book Description

Chapter 7. Case Study : Comparing Twitter Archives; Getting the Data and Distribution of Tweets; Word Frequencies; Comparing Word Usage; Changes in Word Use; Favorites and Retweets; Summary; Chapter 8. Case Study : Mining NASA Metadata; How Data Is Organized at NASA; Wrangling and Tidying the Data; Some Initial Simple Exploration; Word Co-ocurrences and Correlations; Networks of Description and Title Words; Networks of Keywords; Calculating tf-idf for the Description Fields; What Is tf-idf for the Description Field Words?; Connecting Description Fields to Keywords; Topic Modeling.




Tidy Finance with R


Book Description

This textbook shows how to bring theoretical concepts from finance and econometrics to the data. Focusing on coding and data analysis with R, we show how to conduct research in empirical finance from scratch. We start by introducing the concepts of tidy data and coding principles using the tidyverse family of R packages. We then provide the code to prepare common open source and proprietary financial data sources (CRSP, Compustat, Mergent FISD, TRACE) and organize them in a database. We reuse these data in all the subsequent chapters, which we keep as self-contained as possible. The empirical applications range from key concepts of empirical asset pricing (beta estimation, portfolio sorts, performance analysis, Fama-French factors) to modeling and machine learning applications (fixed effects estimation, clustering standard errors, difference-in-difference estimators, ridge regression, Lasso, Elastic net, random forests, neural networks) and portfolio optimization techniques. Highlights 1. Self-contained chapters on the most important applications and methodologies in finance, which can easily be used for the reader’s research or as a reference for courses on empirical finance. 2. Each chapter is reproducible in the sense that the reader can replicate every single figure, table, or number by simply copy-pasting the code we provide. 3. A full-fledged introduction to machine learning with tidymodels based on tidy principles to show how factor selection and option pricing can benefit from Machine Learning methods. 4. Chapter 2 on accessing and managing financial data shows how to retrieve and prepare the most important datasets in the field of financial economics: CRSP and Compustat. The chapter also contains detailed explanations of the most relevant data characteristics. 5. Each chapter provides exercises that are based on established lectures and exercise classes and which are designed to help students to dig deeper. The exercises can be used for self-studying or as a source of inspiration for teaching exercises.




Reproducible Finance with R


Book Description

Reproducible Finance with R: Code Flows and Shiny Apps for Portfolio Analysis is a unique introduction to data science for investment management that explores the three major R/finance coding paradigms, emphasizes data visualization, and explains how to build a cohesive suite of functioning Shiny applications. The full source code, asset price data and live Shiny applications are available at reproduciblefinance.com. The ideal reader works in finance or wants to work in finance and has a desire to learn R code and Shiny through simple, yet practical real-world examples. The book begins with the first step in data science: importing and wrangling data, which in the investment context means importing asset prices, converting to returns, and constructing a portfolio. The next section covers risk and tackles descriptive statistics such as standard deviation, skewness, kurtosis, and their rolling histories. The third section focuses on portfolio theory, analyzing the Sharpe Ratio, CAPM, and Fama French models. The book concludes with applications for finding individual asset contribution to risk and for running Monte Carlo simulations. For each of these tasks, the three major coding paradigms are explored and the work is wrapped into interactive Shiny dashboards.







Learn R


Book Description

Learning a computer language like R can be either frustrating, fun or boring. Having fun requires challenges that wake up the learner’s curiosity but also provide an emotional reward for overcoming them. The book is designed so that it includes smaller and bigger challenges, in what I call playgrounds, in the hope that all readers will enjoy their path to R fluency. Fluency in the use of a language is a skill that is acquired through practice and exploration. For students and professionals in the biological sciences, humanities and many applied fields, recognizing the parallels between R and natural languages should help them feel at home with R. The approach I use is similar to that of a travel guide, encouraging exploration and describing the available alternatives and how to reach them. The intention is to guide the reader through the R landscape of 2024 and beyond. What is new in the second edition? Text expanded by more than 25% to include additional R features and gentler and more detailed explanations Contains 24 new diagrams and flowcharts, seven new tables, and revised text and code examples for clarity All three indexes were expanded, and answers to 28 frequently asked questions added What will you find in this book? Programming concepts explained as they apply to current R Emphasis on the role of abstractions in programming Few prescriptive rules—mostly the author’s preferences together with alternatives Presentation of the R language emphasizing the “R way of doing things” Tutoring for “programming in the small” using scripts for data analysis Explanation of the differences between R proper and extensions for data wrangling The grammar of graphics is described as a language for the construction of data visualisations Examples of data exchange between R and the foreign world using common file formats Coaching to become an independent R user, capable of writing original scripts and solving future challenges




Deep Learning and Scientific Computing with R torch


Book Description

torch is an R port of PyTorch, one of the two most-employed deep learning frameworks in industry and research. It is also an excellent tool to use in scientific computations. It is written entirely in R and C/C++. Though still "young" as a project, R torch already has a vibrant community of users and developers. Experience shows that torch users come from a broad range of different backgrounds. This book aims to be useful to (almost) everyone. Globally speaking, its purposes are threefold: - Provide a thorough introduction to torch basics – both by carefully explaining underlying concepts and ideas, and showing enough examples for the reader to become "fluent" in torch. - Again with a focus on conceptual explanation, show how to use torch in deep-learning applications, ranging from image recognition over time series prediction to audio classification. - Provide a concepts-first, reader-friendly introduction to selected scientific-computation topics (namely, matrix computations, the Discrete Fourier Transform, and wavelets), all accompanied by torch code you can play with. Deep Learning and Scientific Computing with R torch is written with first-hand technical expertise and in an engaging, fun-to-read way.




Spatial Analysis in Geology Using R


Book Description

The integration of geology with data science disciplines, such as spatial statistics, remote sensing, and geographic information systems (GIS), has given rise to a shift in many natural sciences schools, pushing the boundaries of knowledge and enabling new discoveries in geological processes and earth systems. Spatial analysis of geological data can be used to identify patterns and trends in data, to map spatial relationships, and to model spatial processes. R is a consolidated and yet growing statistical programming language with increasing value in spatial analysis often replacing, with advantage, GIS tools. By providing a comprehensive guide for geologists to harness the power of spatial analysis in R, Spatial Analysis in Geology Using R serves as a tool in addressing real-world problems, such as natural resource management, environmental conservation, and hazard prediction and mitigation. Features: Provides a practical and accessible overview of spatial analysis in geology using R Organised in three independent and complementary parts: Introduction to R, Spatial Analysis with R, and Spatial Statistics and Modelling Applied approach with many detailed examples and case studies using real geological data Presents a collection of R packages that are useful in many geological situations Does not assume any prior knowledge of R; all code are explained in detail Supplemented by a website with all data, code, and examples Spatial Analysis in Geology Using R will be useful to any geological researcher who has acquired basic spatial analysis skills, often using GIS, and is interested in deepening those skills through the use of R. It could be used as a reference by applied researchers and analysts in public, private, or third-sector industries. It could also be used to teach a course on the topic to graduate students or for self-study.




Model-Based Clustering, Classification, and Density Estimation Using mclust in R


Book Description

Model-Based Clustering, Classification, and Denisty Estimation Using mclust in R Model-based clustering and classification methods provide a systematic statistical approach to clustering, classification, and density estimation via mixture modeling. The model-based framework allows the problems of choosing or developing an appropriate clustering or classification method to be understood within the context of statistical modeling. The mclust package for the statistical environment R is a widely adopted platform implementing these model-based strategies. The package includes both summary and visual functionality, complementing procedures for estimating and choosing models. Key features of the book: An introduction to the model-based approach and the mclust R package A detailed description of mclust and the underlying modeling strategies An extensive set of examples, color plots, and figures along with the R code for reproducing them Supported by a companion website, including the R code to reproduce the examples and figures presented in the book, errata, and other supplementary material Model-Based Clustering, Classification, and Density Estimation Using mclust in R is accessible to quantitatively trained students and researchers with a basic understanding of statistical methods, including inference and computing. In addition to serving as a reference manual for mclust, the book will be particularly useful to those wishing to employ these model-based techniques in research or applications in statistics, data science, clinical research, social science, and many other disciplines.