Deep LearningAn introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives. “Written by three experts in the field, Deep Learning is the only comprehensive book on the subject.” Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors. |
What people are saying - Write a review
These are comments which I've read in other reviews, however, I definitely agree with them.
I have a Bachelor in Science in Mechanical Engineering with a Minor in Statistical Quality Control and a Master in Science in Sustainable Energy Technologies. I'm not bragging about it. I just want to make clear I have a strong mathematics and statistics background.
After reading books like Introduction to Statistical Learning, Introduction to Machine Learning with Python and Python for Data Analysis and taking Andrew Ng's machine and deep learning specializations in Coursera, I thought it was a good idea to have a text book to follow upon what I learned with all the very valuable resources I mentioned.
Andrew interviewed Goodfellow and Bengio in his online courses and given the size of their contributions to the deep learning community, I thought there were no better people to write a book about this incredibly influential field. Unfortunately, the result is a highly technical book in which even the introduction is hard to totally grasp. If you are not very familiar with matrix calculus, and in particular with matrix calculus notation, you're going to have a very hard time with this book.
Something I appreciated from the aforementioned books is that all the authors found a way to explain in a very down to earth manner (as down to earth as this very complex subject allows) what each algorithm is doing and how it's searching for an optimal result. That doesn't happen in this book.
According to Goodfellow, this book is meant for undergrads and postgrads alike. Nevertheless, if you are not able to read and fully understand a common math Wikipedia article (say Gini coefficient, for example) like myself, you're probably going to find yourself in a position in which you may be even more confused relative to how you started.
I eventually gave up around page 150. I referred to some isolated subjetcs from time to time (convolutional networks, sequence models, GANs, etc.) and sometimes it was useful. It wasn't most of the time, though.
If you read the book's critics on the back, you'll notice amazing opinions from Geoffrey Hinton, Elon Musk and Yan LeCun. That may encourage you to buy it, but please take into account these are no 'normal' human beings and what may seem obvious to them, could very well be quite complicated for an average mind, specially if there is no mathematical background involved.
Bottomline, if you are very well versed mathematically speaking, this may be the best book available. Otherwise, you may be better off trying some online material such as Andrew Ng's excelent set of specializations.
Not well written
Contents
1 Introduction | 1 |
11 Who Should Read This Book? | 8 |
12 Historical Trends in Deep Learning | 12 |
I Applied Math and Machine Learning Basics | 27 |
2 Linear Algebra | 29 |
22 Multiplying Matrices and Vectors | 32 |
23 Identity and Inverse Matrices | 34 |
24 Linear Dependence and Span | 35 |
104 EncoderDecoder SequencetoSequence Architectures | 385 |
105 Deep Recurrent Networks | 387 |
106 Recursive Neural Networks | 388 |
107 The Challenge of LongTerm Dependencies | 390 |
108 Echo State Networks | 392 |
109 Leaky Units and Other Strategies for Multiple Time Scales | 395 |
1010 The Long ShortTerm Memory and Other Gated RNNs | 397 |
1011 Optimization for LongTerm Dependencies | 401 |
25 Norms | 36 |
26 Special Kinds of Matrices and Vectors | 38 |
27 Eigendecomposition | 39 |
28 Singular Value Decomposition | 42 |
29 The MoorePenrose Pseudoinverse | 43 |
210 The Trace Operator | 44 |
211 The Determinant | 45 |
3 Probability and InformationTheory | 51 |
31 Why Probability? | 52 |
32 Random Variables | 54 |
34 Marginal Probability | 56 |
35 Conditional Probability | 57 |
37 Independence and Conditional Independence | 58 |
39 Common Probability Distributions | 60 |
310 Useful Properties of Common Functions | 65 |
311 Bayes Rule | 68 |
313 Information Theory | 70 |
314 Structured Probabilistic Models | 74 |
4 Numerical Computation | 77 |
42 Poor Conditioning | 79 |
44 Constrained Optimization | 89 |
Linear Least Squares | 92 |
5 Machine Learning Basics | 95 |
51 Learning Algorithms | 96 |
52 Capacity Overfitting and Underfitting | 107 |
53 Hyperparameters and Validation Sets | 117 |
54 Estimators Bias and Variance | 119 |
55 Maximum Likelihood Estimation | 128 |
56 Bayesian Statistics | 132 |
57 Supervised Learning Algorithms | 136 |
58 Unsupervised Learning Algorithms | 142 |
59 Stochastic Gradient Descent | 147 |
510 Building a Machine Learning Algorithm | 149 |
511 Challenges Motivating Deep Learning | 151 |
Modern Practices | 161 |
6 Deep Feedforward Networks | 163 |
Learning XOR | 166 |
62 GradientBased Learning | 171 |
63 Hidden Units | 185 |
64 Architecture Design | 191 |
65 BackPropagation and Other Differentiation Algorithms | 197 |
66 Historical Notes | 217 |
7 Regularization for Deep Learning | 221 |
71 Parameter Norm Penalties | 223 |
72 Norm Penalties as Constrained Optimization | 230 |
73 Regularization and UnderConstrained Problems | 232 |
74 Dataset Augmentation | 233 |
75 Noise Robustness | 235 |
76 SemiSupervised Learning | 236 |
77 Multitask Learning | 237 |
78 Early Stopping | 239 |
79 Parameter Tying and Parameter Sharing | 246 |
710 Sparse Representations | 247 |
711 Bagging and Other Ensemble Methods | 249 |
712 Dropout | 251 |
713 Adversarial Training | 261 |
714 Tangent Distance Tangent Prop and Manifold Tangent Classifier | 263 |
8 Optimization for Training Deep Models | 267 |
81 How Learning Differs from Pure Optimization | 268 |
82 Challenges in Neural Network Optimization | 275 |
83 Basic Algorithms | 286 |
84 Parameter Initialization Strategies | 292 |
85 Algorithms with Adaptive Learning Rates | 298 |
86 Approximate SecondOrder Methods | 302 |
87 Optimization Strategies and MetaAlgorithms | 309 |
9 Convolutional Networks | 321 |
91 The Convolution Operation | 322 |
92 Motivation | 324 |
93 Pooling | 330 |
94 Convolution and Pooling as an Infinitely Strong Prior | 334 |
95 Variants of the Basic Convolution Function | 337 |
96 Structured Outputs | 347 |
97 Data Types | 348 |
98 Efficient Convolution Algorithms | 350 |
99 Random or Unsupervised Features | 351 |
910 The Neuroscientific Basis for Convolutional Networks | 353 |
911 Convolutional Networks and the History of Deep Learning | 359 |
Recurrent and Recursive Nets | 363 |
101 Unfolding Computational Graphs | 365 |
102 Recurrent Neural Networks | 368 |
103 Bidirectional RNNs | 383 |
1012 Explicit Memory | 405 |
11 Practical Methodology | 409 |
111 Performance Metrics | 410 |
112 Default Baseline Models | 413 |
113 Determining Whether to Gather More Data | 414 |
114 Selecting Hyperparameters | 415 |
115 Debugging Strategies | 424 |
MultiDigit Number Recognition | 428 |
12 Applications | 431 |
122 Computer Vision | 440 |
123 Speech Recognition | 446 |
124 Natural Language Processing | 448 |
125 Other Applications | 465 |
III Deep Learning Research | 475 |
13 Linear Factor Models | 479 |
131 Probabilistic PCA and Factor Analysis | 480 |
132 Independent Component Analysis ICA | 481 |
133 Slow Feature Analysis | 484 |
134 Sparse Coding | 486 |
135 Manifold Interpretation of PCA | 489 |
14 Autoencoders | 493 |
141 Undercomplete Autoencoders | 494 |
142 Regularized Autoencoders | 495 |
143 Representational Power Layer Size and Depth | 499 |
144 Stochastic Encoders and Decoders | 500 |
145 Denoising Autoencoders | 501 |
146 Learning Manifolds with Autoencoders | 506 |
147 Contractive Autoencoders | 510 |
148 Predictive Sparse Decomposition | 514 |
149 Applications of Autoencoders | 515 |
15 Representation Learning | 517 |
151 Greedy LayerWise Unsupervised Pretraining | 519 |
152 Transfer Learning and Domain Adaptation | 526 |
153 SemiSupervised Disentangling of Causal Factors | 532 |
154 Distributed Representation | 536 |
155 Exponential Gains from Depth | 543 |
156 Providing Clues to Discover Underlying Causes | 544 |
16 Structured Probabilistic Models for Deep Learning | 549 |
161 The Challenge of Unstructured Modeling | 550 |
162 Using Graphs to Describe Model Structure | 554 |
163 Sampling from Graphical Models | 570 |
164 Advantages of Structured Modeling | 572 |
166 Inference and Approximate Inference | 573 |
167 The Deep Learning Approach to Structured Probabilistic Models | 575 |
17 Monte Carlo Methods | 581 |
172 Importance Sampling | 583 |
173 Markov Chain Monte Carlo Methods | 586 |
174 Gibbs Sampling | 590 |
175 The Challenge of Mixing between Separated Modes | 591 |
18 Confronting the Partition Function | 597 |
181 The LogLikelihood Gradient | 598 |
182 Stochastic Maximum Likelihood and Contrastive Divergence | 599 |
183 Pseudolikelihood | 607 |
184 Score Matching and Ratio Matching | 609 |
185 Denoising Score Matching | 611 |
186 NoiseContrastive Estimation | 612 |
187 Estimating the Partition Function | 614 |
19 Approximate Inference | 623 |
191 Inference as Optimization | 624 |
192 Expectation Maximization | 626 |
193 MAP Inference and Sparse Coding | 627 |
194 Variational Inference and Learning | 629 |
195 Learned Approximate Inference | 642 |
20 Deep Generative Models | 645 |
202 Restricted Boltzmann Machines | 647 |
203 Deep Belief Networks | 651 |
204 Deep Boltzmann Machines | 654 |
205 Boltzmann Machines for RealValued Data | 667 |
206 Convolutional Boltzmann Machines | 673 |
207 Boltzmann Machines for Structured or Sequential Outputs | 675 |
208 Other Boltzmann Machines | 677 |
209 BackPropagation through Random Operations | 678 |
2010 Directed Generative Nets | 682 |
2011 Drawing Samples from Autoencoders | 701 |
2012 Generative Stochastic Networks | 704 |
2013 Other Generation Schemes | 706 |
2014 Evaluating Generative Models | 707 |
2015 Conclusion | 710 |
Bibliography | 711 |
767 | |