出版时间：2009-1 出版社：世界图书出版公司 作者：哈斯蒂 页数：533
The field of Statistics is constantly challenged by the problems that science and industry brings to its door. In the early days, these problems often came from agricultural and industrial experiments and were relatively small in scope. With the advent of computers and the information age, statistical problems have exploded both in size and complexity. Challenges in the areas of data storage, organization and searching have led to the new field of "data mining"; statistical and computational problems in biology and medicine have created "bioinformatics." Vast amounts of data are being generated in many fields, and the statistician's job is to make sense of it all: to extract important patterns and trends, and understand "what the data says." We call this learning from data.The challenges in learning from data have led to a revolution in the statistical sciences. Since computation plays such a key role, it is not surprising that much of this new development has been done by researchers in other fields such as computer science and engineering.The learning problems that we consider can be roughly categorized as either supervised or unsupervised. In supervised learning, the goal is to predict the value of an outcome measure based on a number of input measures; in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures.
The learning problems that we consider can be roughly categorized as either supervised or unsupervised. In supervised learning, the goal is to predict the value of an outcome measure based on a number of input measures; in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures.
Preface1 Introduction Overview of Supervised Learning2.1 Introduction2.2 Variable Types and Terminology2.3 Two Simple Approaches to Prediction: Least Squares and Nearest Neighbors2.3.1 Linear Models and Least Squares2.3.2 Nearest-Neighbor Methods2.3.3 From Least Squares to Nearest Neighbors2.4 Statistical Decision Theory2.5 Local Methods in High Dimensions2.6 Statistical Models, Supervised Learning and Function Approximation2.6.1 A Statistical Model for the Joint Distribution Pr(X,Y)2.6.2 Supervised Learning2.6.3 Function Approximation2.7 Structured Regression Models2.7.1 Difficulty of the Problem2.8 Classes of Restricted Estimators2.8.1 Roughness Penalty and Bayesian Methods2.8.2 Kernel Methods and Local Regression2.8.3 Basis Functions and Dictionary Methods2.9 Model Selection and the Bias-Variance TradeoffBibliographic Notes Exercises 3 Linear Methods for Regression3.1 Introduction3.2 Linear Regression Models and Least Squares 3.2.1 Example：Prostate Cancer3.2.2 The Ganss-Markov Theorem3.3 Multiple Regression from Simple Univariate Regression3.3.1 Multiple Outputs3.4 Subset Selection and Coefficient Shrinkage3.4.1 Subset Selection3.4.2 Prostate Cancer Data Example fContinued)3.4.3 Shrinkage Methods3.4.4 Methods Using Derived Input Directions3.4.5 Discussion：A Comparison of the Selection and Shrinkage Methods3.4.6 Multiple Outcome Shrinkage and Selection 3.5 Compntational ConsiderationsBibliographic NotesExercises 4 Linear Methods for Classification4.1 Introduction4.2 Linear Regression of an Indicator Matrix4.3 Linear Discriminant Analysis4.3.1 Regularized Discriminant Analysis4.3.2 Computations for LDA 4.3.3 Reduced-Rank Linear Discriminant Analysis 4.4 Logistic Regression4.4.1 Fitting Logistic Regression Models4.4.2 Example：South African Heart Disease4.4.3 Quadratic Approximations and Inference4.4.4 Logistic Regression or LDA74.5 Separating Hyper planes4.5.1 Rosenblatt's Perceptron Learning Algorithm4.5.2 Optimal Separating Hyper planesBibliographic NotesExercises 5 Basis Expansions and Regularizatlon5.1 Introduction5.2 Piecewise Polynomials and Splines5.2.1 Natural Cubic Splines5.2.2 Example: South African Heart Disease (Continued) 5.2.3 Example: Phoneme Recognition5.3 Filtering and Feature Extraction5.4 Smoothing Splines5.4.1 Degrees of Freedom and Smoother Matrices5.5 Automatic Selection of the Smoothing Parameters5.5.1 Fixing the Degrees of Freedom5.5.2 The Bias-Variance Tradeoff5.6 Nonparametric Logistic Regression5.7 Multidimensional Splines5.8 Regularization and Reproducing Kernel Hilbert Spaces . . 5.8.1 Spaces of Phnctions Generated by Kernels5.8.2 Examples of RKHS5.9 Wavelet Smoothing5.9.1 Wavelet Bases and the Wavelet Transform5.9.2 Adaptive Wavelet FilteringBibliographic NotesExercisesAppendix: Computational Considerations for SplinesAppendix: B-splinesAppendix: Computations for Smoothing Splines6 Kernel Methods6.1 One-Dimensional Kernel Smoothers6.1.1 Local Linear Regression6.1.2 Local Polynomial Regression6.2 Selecting the Width of the Kernel6.3 Local Regression in Jap6.4 Structured Local Regression Models in ]ap6.4.1 Structured Kernels6.4.2 Structured Regression Functions6.5 Local Likelihood and Other Models6.6 Kernel Density Estimation and Classification6.6.1 Kernel Density Estimation6.6.2 Kernel Density Classification6.6.3 The Naive Bayes Classifier6.7 Radial Basis Functions and Kernels6.8 Mixture Models for Density Estimation and Classification 6.9 Computational ConsiderationsBibliographic Notes Exercises7 Model Assessment and Selection7.1 Introduction7.2 Bias, Variance and Model Complexity7.3 The Bias-Variance Decomposition7.3.1 Example: Bias-Variance Tradeoff7.4 Optimism of the Training Error Rate7.5 Estimates of In-Sample Prediction Error7.6 The Effective Number of Parameters7.7 The Bayesian Approach and BIC7.8 Minimum Description Length7.9 Vapnik Chernovenkis Dimension7.9.1 Example (Continued)7.10 Cross-Validation7.11 Bootstrap Methods7.11.1 Example (Continued)Bibliographic NotesExercises8 Model Inference and Averaging8.1 Introduction8.2 The Bootstrap and Maximum Likelihood Methods8.2.1 A Smoothing Example8.2.2 Maximum Likelihood Inference8.2.3 Bootstrap versus Maximum Likelihood8.3 Bayesian Methods8.4 Relationship Between the Bootstrap and Bayesian Inference8.5 The EM Algorithm8.5.1 Two-Component Mixture Model8.5.2 The EM Algorithm in General8.5.3 EM as a Maximization-Maximization Procedure 8.6 MCMC for Sampling from the Posterior8.7 Bagging8.7.1 Example: Trees with Simulated Data8.8 Model Averaging and Stacking8.9 Stochastic Search: BumpingBibliographic NotesExercises9 Additive Models, Trees, and Related Methods9.1 Generalized Additive Models9.1.1 Fitting Additive Models9.1.2 Example: Additive Logistic Regression9.1.3 Summary9.2 Tree Based Methods 10 Boosting and Additive Trees11 Neural Networks12 Support Vector Machines and Flexible Discriminants13 Prototype Methods and Nearest-Neighbors14 Unsupervised LearningReferencesAuthor IndexIndex