(To see other currencies, click on price)
MORE ABOUT THIS BOOK
Main description:
This textbook integrates important mathematical foundations, efficient computational algorithms, applied statistical inference techniques, and cutting-edge machine learning approaches to address a wide range of crucial biomedical informatics, health analytics applications, and decision science challenges. Each concept in the book includes a rigorous symbolic formulation coupled with computational algorithms and complete end-to-end pipeline protocols implemented as functional R electronic markdown notebooks. These workflows support active learning and demonstrate comprehensive data manipulations, interactive visualizations, and sophisticated analytics. The content includes open problems, state-of-the-art scientific knowledge, ethical integration of heterogeneous scientific tools, and procedures for systematic validation and dissemination of reproducible research findings.Complementary to the enormous challenges related to handling, interrogating, and understanding massive amounts of complex structured and unstructured data, there are unique opportunities that come with access to a wealth of feature-rich, high-dimensional, and time-varying information. The topics covered in Data Science and Predictive Analytics address specific knowledge gaps, resolve educational barriers, and mitigate workforce information-readiness and data science deficiencies. Specifically, it provides a transdisciplinary curriculum integrating core mathematical principles, modern computational methods, advanced data science techniques, model-based machine learning, model-free artificial intelligence, and innovative biomedical applications. The book's fourteen chapters start with an introduction and progressively build foundational skills from visualization to linear modeling, dimensionality reduction, supervised classification, black-box machine learning techniques, qualitative learning methods, unsupervised clustering, model performance assessment, feature selection strategies, longitudinal data analytics, optimization, neural networks, and deep learning. The second edition of the book includes additional learning-based strategies utilizing generative adversarial networks, transfer learning, and synthetic data generation, as well as eight complementary electronic appendices.
This textbook is suitable for formal didactic instructor-guided course education, as well as for individual or team-supported self-learning. The material is presented at the upper-division and graduate-level college courses and covers applied and interdisciplinary mathematics, contemporary learning-based data science techniques, computational algorithm development, optimization theory, statistical computing, and biomedical sciences. The analytical techniques and predictive scientific methods described in the book may be useful to a wide range of readers, formal and informal learners, college instructors, researchers, and engineers throughout the academy, industry, government, regulatory, funding, and policy agencies. The supporting book website provides many examples, datasets, functional scripts, complete electronic notebooks, extensive appendices, and additional materials.
Contents:
1 Front MatterForewordDSPA Application and Use Disclaimer2nd Edition PrefaceBook ContentNotations1 Chapter 1 - Introduction1.1 Motivation1.1.1 DSPA Mission and Objectives1.1.2 Examples of driving motivational problems and challenges1.1.3 Common Characteristics of Big (Biomedical and Health) Data1.1.4 Data Science1.1.5 Predictive Analytics1.1.6 High-throughput Big Data Analytics1.1.7 Examples of data repositories, archives and services1.1.8 Responsible Data Science and Ethical Predictive Analytics1.1.9 DSPA Expectations1.2 Foundations of R1.2.1 Why use R?1.2.2 Getting started with R1.2.3 Mathematics, Statistics, and Optimization1.2.4 Advanced Data Processing1.2.5 Basic Plotting1.2.6 Basic R Programming1.2.7 Data Simulation Primer1.3 Practice Problems1.3.1 Long-to-Wide Data format translation1.3.2 Data Frames1.3.3 Data stratification1.3.4 Simulation1.3.5 Programming1.4 Appendix1.4.1 Tidyverse1.4.2 Additional R documentation and resources1.4.3 HTML SOCR Data Import1.4.4 R Debugging2 Chapter 2: Basic Visualization and Exploratory Data Analytics2.1 Data Handling2.1.1 Saving and Loading R Data Structures2.1.2 Importing and Saving Data from CSV Files2.1.3 Importing Data from ZIP and SAV Files2.1.4 Exploring the Structure of Data2.1.5 Exploring Numeric Variables2.1.6 Measuring Central Tendency - mean, median, mode2.1.7 Measuring Spread - variance, quartiles and the five-number summary2.1.8 Visualizing Numeric Variables - boxplots2.1.9 Visualizing Numeric Variables - histograms2.1.10 Uniform and normal distributions2.1.11 Exploring Categorical Variables2.1.12 Exploring Relationships Between Variables2.1.13 Missing Data2.1.14 Parsing web pages and visualizing tabular HTML data2.1.15 Cohort-Rebalancing (for Imbalanced Groups)2.2 Exploratory Data Analytics (EDA)2.2.1 Classification of visualization methods2.2.2 Composition2.2.3 Comparison2.2.4 Relationships2.3 Practice Problems2.3.1 Data Manipulation2.3.2 Bivariate relations2.3.3 Missing data2.3.4 Surface plots2.3.5 Unbalanced groups2.3.6 Common plots2.3.7 Trees and Graphs2.3.8 Data EDA examples2.3.9 Data reports3 Chapter 3: Linear Algebra, Matrix Computing and Regression Modeling3.1 Linear Algebra3.1.1 Building Matrices3.1.2 Matrix subscripts3.1.3 Addition and subtraction3.1.4 Multiplication3.2 Matrix Computing3.2.1 Solving Systems of Equations3.2.2 The identity matrix3.2.3 Vectors, Matrices, and Scalars3.2.4 Sample Statistics3.2.5 Applications of Matrix Algebra in Linear Modeling3.2.6 Finding function extrema (min/max) using calculus3.2.7 Linear modeling in R3.3 Eigenspectra - Eigenvalues and Eigenvectors3.4 Matrix notation3.5 Linear regression3.5.1 Sample covariance matrix3.6 Linear multivariate linear regression modeling3.6.1 Simple linear regression3.6.2 Ordinary least squares estimation3.6.3 Regression Model Assumptions3.6.4 Correlations3.6.5 Multiple Linear Regression3.7 Case Study 1: Baseball Players3.7.1 Step 1 - collecting data3.7.2 Step 2 - exploring and preparing the data3.7.3 Step 3 - training a model on the data3.7.4 Step 4 - evaluating model performance3.7.5 Step 5 - improving model performance3.8 Regression trees and model trees3.8.1 Adding regression to trees3.9 Bayesian Additive Regression Trees (BART)3.9.1 1D Simulation3.9.2 Higher-Dimensional Simulation3.9.3 Heart Attack Hospitalization Case-Study3.9.4 Another look at Case study 2: Baseball Players3.10 Practice Problems3.10.1 How is matrix multiplication defined?3.10.2 Scalar vs. Matrix Multiplication3.10.3 Matrix Equations3.10.4 Least Square Estimation3.10.5 Matrix manipulation3.10.6 Matrix Transposition3.10.7 Sample Statistics3.10.8 Eigenvalues and Eigenvectors3.10.9 Regression Forecasting using Numerical Data4 Chapter 4: Linear and Nonlinear Dimensionality Reduction4.1 Motivational Example: Reducing 2D to 1D4.2 Matrix Rotations4.3 Summary (PCA, ICA, and FA)4.4 Principal Component Analysis (PCA)4.4.1 Principal Components4.5 Independent component analysis (ICA)4.6 Factor Analysis (FA)4.7 Singular Value Decomposition (SVD)4.7.1 SVD Summary4.8 t-distributed Stochastic Neighbor Embedding (t-SNE)4.8.1 t-SNE Formulation4.8.2 t-SNE Example: Hand-written Digit Recognition4.9 Uniform Manifold Approximation and Projection (UMAP)4.9.1 Mathematical formulation4.9.2 Hand-Written Digits Recognition4.9.3 Apply UMAP for class-prediction using new data4.10 UMAP Parameters4.10.1 Stability, Replicability, and Reproducibility4.10.2 UMAP Interpretation4.11 Dimensionality Reduction Case Study (Parkinson's Disease)4.11.1 Step 1: Collecting Data4.11.2 Step 2: Exploring and preparing the data4.11.3 PCA4.11.4 Factor analysis (FA)4.11.5 t-SNE4.11.6 Uniform Manifold Approximation and Projection (UMAP)4.12 Practice Problems4.12.1 Parkinson's Disease example4.12.2 Allometric Relations in Plants example4.12.3 3D Volumetric Brain Study5 Chapter 5: Supervised Classification5.1 k-Nearest Neighbor Approach5.2 Distance Function and Dummy coding5.2.1 Estimation of the hyperparameter k5.2.2 Rescaling of the features5.2.3 Rescaling Formulas5.2.4 Case Study: Youth Development5.2.5 Case Study: Predicting Galaxy Spins5.3 Probabilistic Learning - Naive Bayes Classification5.3.1 Overview of the Naive Bayes Method5.3.2 Model Assumptions5.3.3 Bayes Formula5.3.4 The Laplace Estimator5.3.5 Case Study: Head and Neck Cancer Medication5.4 Decision Trees and Divide and Conquer Classification5.4.1 Motivation5.4.2 Decision Tree Overview5.4.3 Case Study 1: Quality of Life and Chronic Disease5.4.4 Classification rules5.5 Case Study 2: QoL in Chronic Disease (Take 2)5.6 Practice Problems5.6.1 Iris Species5.6.2 Cancer Study5.6.3 Baseball Data5.6.4 Medical Specialty Text-Notes Classification5.6.5 Chronic Disease Case-Study6 Chapter 6: Black Box Machine Learning Methods6.1 Neural Networks6.1.1 From biological to artificial neurons6.1.2 Activation functions6.2 Network topology6.2.1 Network layers6.2.2 Training neural networks with backpropagation6.2.3 Case Study 1: Google Trends and the Stock Market - Regression6.2.4 Simple NN demo - learning to compute 6.2.5 Case Study 2: Google Trends and the Stock Market - Classification6.3 Support Vector Machines (SVM)6.3.1 Classification with hyperplanes6.3.2 Case Study 3: Optical Character Recognition (OCR)6.3.3 Case Study 4: Iris Flowers6.3.4 Parameter Tuning6.3.5 Improving the performance of Gaussian kernels6.4 Ensemble meta-learning6.4.1 Bagging6.4.2 Boosting6.4.3 Random forests6.4.4 Random Forest Algorithm (Pseudo Code)6.4.5 Adaptive boosting6.5 Practice Problems6.5.1 Problem 1: Google Trends and the Stock Market6.5.2 Problem 2: Quality of Life and Chronic Disease7 Chapter 7: Qualitative Learning Methods - Text Mining, Natural Language Processing, Apriori Association Rules Learning7.1 Natural Language Processing (NLP) and Text Mining (TM)7.1.1 A simple NLP/TM example7.1.2 Case-Study: Job ranking7.1.3 Area Under ROC Curve7.1.4 TF-IDF7.1.5 Cosine similarity7.1.6 Sentiment analysis7.1.7 NLP/TM Analytics7.2 Apriori Association Rules Learning7.2.1 Association Rules7.2.2 The Apriori algorithm for association rule learning7.2.3 Rule support and confidence7.2.4 Building a set of rules with the Apriori principle7.2.5 A toy example7.2.6 Case Study 1: Head and Neck Cancer Medications7.2.7 Graphical depiction of association rules7.2.8 Saving association rules to a file or a data frame7.3 Summary7.4 Practice Problems7.4.1 Groceries7.4.2 Titanic Passengers8 Chapter 8: Unsupervised Clustering8.1 ML Clustering8.2 Silhouette plots8.3 The k-Means Clustering Algorithm8.3.1 Pseudocode8.3.2 Choosing the appropriate number of clusters8.3.3 Case Study 1: Divorce and Consequences on Young Adults8.3.4 Model improvement8.3.5 Case study 2: Pediatric Trauma8.3.6 Feature selection for k-Means clustering8.4 Hierarchical Clustering8.5 Spectral Clustering8.5.1 Image segmentation using spectral clustering8.5.2 Point cloud segmentation using spectral clustering8.6 Gaussian mixture models8.7 Summary8.8 Practice Problems8.8.1 Youth Development9 Chapter 9: Model Performance Assessment, Validation, and Improvement9.1 Measuring the performance of classification methods9.2 Evaluation strategies9.2.1 Binary outcomes9.2.2 Cross-tables, contingency tables, and confusion-matrices9.2.3 Other measures of performance beyond accuracy9.2.4 Visualizing performance tradeoffs (ROC Curve)9.3 Estimating future performance (internal statistical cross-validation)9.3.1 The holdout method9.3.2 Cross-validation9.3.3 Bootstrap sampling9.4 Improving model performance by parameter tuning9.4.1 Using caret for automated parameter tuning9.5 Customizing the tuning process9.6 Comparing the performance of several alternative models9.7 Forecasting types and assessment approaches9.7.1 Overfitting9.8 Internal Statistical Cross-validation9.8.1 Example (Linear Regression)9.8.2 Cross-validation methods9.8.3 Case-Studies9.8.4 Summary of CV output9.8.5 Alternative predictor functions9.8.6 Foundation of LDA and QDA for prediction, dimensionality reduction, or forecasting9.8.7 Comparing multiple classifiers10 Chapter 10: Specialized Machine Learning Topics10.1 Working with specialized data and databases10.1.1 Data format conversion10.1.2 Querying data in SQL databases10.1.3 SparQL Queries10.1.4 Real Random Number Generation10.1.5 Downloading the complete text of web pages10.1.6 Reading and writing XML with the XML package10.1.7 Web-page Data Scraping10.1.8 Parsing JSON from web APIs10.1.9 Reading and writing Microsoft Excel spreadsheets using XLSX10.2 Working with domain-specific data10.2.1 Working with bioinformatics data10.2.2 Visualizing network data10.3 Data Streaming10.3.1 Definition10.3.2 The stream package10.3.3 Synthetic example - random Gaussian stream10.3.4 Generate the stream10.3.5 Sources of Data Streams10.3.6 Printing, plotting, and saving streams10.3.7 Stream animation10.3.8 Case-Study: SOCR Knee Pain Data10.3.9 Data Stream clustering and classification (DSC)10.3.10 Evaluation of data stream clustering10.4 Optimization and improving the computational performance10.4.1 Generalizing tabular data structures with dplyr10.4.2 Making data frames faster with data.table10.4.3 Creating disk-based data frames with ff10.4.4 Using massive matrices with bigmemory10.5 Parallel computing10.5.1 Measuring execution time10.5.2 Parallel processing with multiple cores10.5.3 Parallelization using foreach and doParallel10.5.4 GPU computing10.6 Deploying optimized learning algorithms10.6.1 Building bigger regression models with biglm10.6.2 Growing bigger and faster random forests with bigrf10.6.3 Training and evaluation models in parallel with caret10.7 R Notebook support for other programming languages10.7.1 R-Python integration10.7.2 Installing Python10.7.3 Install the reticulate package10.7.4 Installing and importing Python Modules10.7.5 Python-based data modeling10.7.6 Visualization of the results in R10.7.7 R integration with C/C++10.8 Practice problem11 Chapter 11: Variable Importance and Feature Selection11.1 Feature selection methods11.1.1 Filtering techniques11.1.2 Wrapper11.1.3 Embedded Techniques11.1.4 Random Forest Feature Selection11.1.5 Case Study - ALS11.2 Regularized Linear Modeling and Controlled Variable Selection11.2.1 General Questions11.2.2 Model Regularization11.2.3 Matrix notation11.2.4 Regularized Linear Modeling11.2.5 Predictor Standardization11.2.6 Estimation Goals11.2.7 Linear Regression11.2.8 Drawbacks of Linear Regression11.2.9 Variable Selection11.2.10 Simple Regularization Framework11.2.11 General Regularization Framework11.2.12 Likelihood Ratio Test (LRT), False Discovery Rate (FDR), and Logistic Transform11.2.13 Logistic Transformation11.2.14 Implementation of Regularization11.2.15 Computational Complexity11.2.16 LASSO and Ridge Solution Paths11.2.17 Regression Solution Paths - Ridge vs. LASSO11.2.18 Choice of the Regularization Parameter11.2.19 Cross Validation Motivation11.2.20 n-Fold Cross Validation11.2.21 LASSO 10-Fold Cross Validation11.2.22 Stepwise OLS (ordinary least squares)11.2.23 Final Models11.2.24 Model Performance11.2.25 Summary11.3 Knockoff Filtering (FDR-Controlled Feature Selection)11.3.1 Simulated Knockoff Example11.3.2 Knockoff invocation11.3.3 PD Neuroimaging-genetics Case-Study11.4 Practice Problems12 Chapter 12: Big Longitudinal Data Analysis12.1 Classical Time-Series Analytic Approaches12.1.1 Time series analysis12.1.2 Structural Equation Modeling (SEM)-latent variables12.1.3 Longitudinal data analysis - Linear Mixed Model12.1.4 Generalized estimating equations (GEE)12.1.5 PD/PPMI Case-Study: SEM, GLMM, and GEE modeling12.2 Network-based Approaches12.2.1 Background12.2.2 Recurrent Neural Networks (RNN)12.2.3 Tensor Format Representation12.2.4 Simulated RNN case-study12.2.5 Climate Data Study12.2.6 Keras-based Multi-covariate LSTM Time-series Analysis and Forecasting13 Chapter 13: Function Optimization13.1 General optimization approach13.1.1 First-order Gradient-based Optimization13.1.2 Second-order Hessian-based Optimization13.1.3 Gradient-free Optimization13.2 Free (unconstrained) optimization13.2.1 Example 1: minimizing a univariate function (inverse-CDF)13.2.2 Example 2: minimizing a bivariate function13.2.3 Example 3: using simulated annealing to find the maximum of an oscillatory function13.3 Constrained Optimization13.3.1 Equality constraints13.3.2 Lagrange Multipliers13.3.3 Inequality constrained optimization13.3.4 Quadratic Programming (QP)13.4 General Nonlinear Optimization13.4.1 Dual problem optimization13.5 Manual vs. Automated Lagrange Multiplier Optimization13.6 Data Denoising13.7 Sparse Matrices13.8 Parallel Computing13.9 Foundational Methods for Function Optimization13.9.1 Basics13.9.2 Gradient Descent13.9.3 Convexity13.9.4 Foundations of the Newton-Raphson's Method13.9.5 Stochastic Gradient Descent13.9.6 Simulated Annealing (SANN)13.10 Hands-on Examples13.10.1 Example 1: Healthcare Manufacturer Product Optimization13.10.2 Example 2: Optimization of the Booth's function13.10.3 Example 3: Extrema of the bivariate Goldstein-Price Function13.10.4 Example 4: Bivariate Oscillatory Function13.10.5 Nonlinear Constraint Optimization Problem13.11 Examples of explicit optimization use in AI/ML13.12 Practice Problems14 Chapter 14: Deep Learning, Neural Networks14.1 Perceptrons14.2 Biological Relevance14.3 Simple Neural Net Examples14.3.1 Exclusive OR (XOR) Operator14.3.2 NAND Operator14.3.3 Complex networks designed using simple building blocks14.4 Neural Network Modeling using Keras14.4.1 Iterations - Samples, Batches and Epochs14.4.2 Use-Case: Predicting Titanic Passenger Survival14.4.3 EDA/Visualization14.4.4 Data Preprocessing14.4.5 Keras Modeling14.4.6 NN Model Fitting14.4.7 Convolutional Neural Networks (CNNs)14.4.8 Model Exploration14.4.9 Passenger Survival Forecasting using New Data14.4.10 Fine-tuning the NN Model14.4.11 Model Export and Import14.5 Case-Studies14.5.1 Classification example using Sonar data14.5.2 Schizophrenia Neuroimaging Study14.5.3 ALS regression example14.5.4 IBS Study14.5.5 Country QoL Ranking Data14.5.6 Handwritten Digits Classification14.6 Classifying Real-World Images using Pre-Trained Tensorflow and Keras Models14.6.1 Load the Pre-trained Model14.6.2 Load and Preprocess a New Image14.6.3 Image Classification14.6.4 Additional Image Classification Examples14.7 Data Generation: simulating synthetic data14.7.1 Fractal shapes14.7.2 Fake images14.7.3 Generative Adversarial Networks (GANs)14.8 Transfer Learning14.8.1 Text Classification using Deep Network Transfer Learning14.8.2 Multinomial Transfer Learning classification of Clinical Text14.8.3 Binary Classification of Film Reviews14.9 Image classification14.9.1 Performance Metrics14.9.2 Torch Deep Convolutional Neural Network (CNN)14.9.3 Tensorflow Image Pre-processing Pipeline14.10 Additional References14.11 Practice Problems14.11.1 Deep learning Classification14.11.2 Deep learning Regression14.11.3 Image classification14.11.4 (Challenging Problem) Deep Convolutional Networks for 3D Volume Segmentation15 Summary16 Electronic Appendix17 Glossary18 Index
PRODUCT DETAILS
Publisher: Springer (Springer International Publishing AG)
Publication date: February, 2023
Pages: 918
Weight: 1586g
Availability: Available
Subcategories: General Issues
Publisher recommends
