Deep LearningAn introduction to a broad range of topics in deep learning, covering mathematical and conceptual background, deep learning techniques used in industry, and research perspectives. “Written by three experts in the field, Deep Learning is the only comprehensive book on the subject.” Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors. 
What people are saying  Write a review
User ratings
5 stars 
 
4 stars 
 
3 stars 
 
2 stars 
 
1 star 

Super..cool
These are comments which I've read in other reviews, however, I definitely agree with them.
I have a Bachelor in Science in Mechanical Engineering with a Minor in Statistical Quality Control and a Master in Science in Sustainable Energy Technologies. I'm not bragging about it. I just want to make clear I have a strong mathematics and statistics background.
After reading books like Introduction to Statistical Learning, Introduction to Machine Learning with Python and Python for Data Analysis and taking Andrew Ng's machine and deep learning specializations in Coursera, I thought it was a good idea to have a text book to follow upon what I learned with all the very valuable resources I mentioned.
Andrew interviewed Goodfellow and Bengio in his online courses and given the size of their contributions to the deep learning community, I thought there were no better people to write a book about this incredibly influential field. Unfortunately, the result is a highly technical book in which even the introduction is hard to totally grasp. If you are not very familiar with matrix calculus, and in particular with matrix calculus notation, you're going to have a very hard time with this book.
Something I appreciated from the aforementioned books is that all the authors found a way to explain in a very down to earth manner (as down to earth as this very complex subject allows) what each algorithm is doing and how it's searching for an optimal result. That doesn't happen in this book.
According to Goodfellow, this book is meant for undergrads and postgrads alike. Nevertheless, if you are not able to read and fully understand a common math Wikipedia article (say Gini coefficient, for example) like myself, you're probably going to find yourself in a position in which you may be even more confused relative to how you started.
I eventually gave up around page 150. I referred to some isolated subjetcs from time to time (convolutional networks, sequence models, GANs, etc.) and sometimes it was useful. It wasn't most of the time, though.
If you read the book's critics on the back, you'll notice amazing opinions from Geoffrey Hinton, Elon Musk and Yan LeCun. That may encourage you to buy it, but please take into account these are no 'normal' human beings and what may seem obvious to them, could very well be quite complicated for an average mind, specially if there is no mathematical background involved.
Bottomline, if you are very well versed mathematically speaking, this may be the best book available. Otherwise, you may be better off trying some online material such as Andrew Ng's excelent set of specializations.
Contents
1 Introduction  1 
11 Who Should Read This Book?  8 
12 Historical Trends in Deep Learning  12 
I Applied Math and Machine Learning Basics  27 
2 Linear Algebra  29 
22 Multiplying Matrices and Vectors  32 
23 Identity and Inverse Matrices  34 
24 Linear Dependence and Span  35 
104 EncoderDecoder SequencetoSequence Architectures  385 
105 Deep Recurrent Networks  387 
106 Recursive Neural Networks  388 
107 The Challenge of LongTerm Dependencies  390 
108 Echo State Networks  392 
109 Leaky Units and Other Strategies for Multiple Time Scales  395 
1010 The Long ShortTerm Memory and Other Gated RNNs  397 
1011 Optimization for LongTerm Dependencies  401 
25 Norms  36 
26 Special Kinds of Matrices and Vectors  38 
27 Eigendecomposition  39 
28 Singular Value Decomposition  42 
29 The MoorePenrose Pseudoinverse  43 
210 The Trace Operator  44 
211 The Determinant  45 
3 Probability and InformationTheory  51 
31 Why Probability?  52 
32 Random Variables  54 
34 Marginal Probability  56 
35 Conditional Probability  57 
37 Independence and Conditional Independence  58 
39 Common Probability Distributions  60 
310 Useful Properties of Common Functions  65 
311 Bayes Rule  68 
313 Information Theory  70 
314 Structured Probabilistic Models  74 
4 Numerical Computation  77 
42 Poor Conditioning  79 
44 Constrained Optimization  89 
Linear Least Squares  92 
5 Machine Learning Basics  95 
51 Learning Algorithms  96 
52 Capacity Overfitting and Underfitting  107 
53 Hyperparameters and Validation Sets  117 
54 Estimators Bias and Variance  119 
55 Maximum Likelihood Estimation  128 
56 Bayesian Statistics  132 
57 Supervised Learning Algorithms  136 
58 Unsupervised Learning Algorithms  142 
59 Stochastic Gradient Descent  147 
510 Building a Machine Learning Algorithm  149 
511 Challenges Motivating Deep Learning  151 
Modern Practices  161 
6 Deep Feedforward Networks  163 
Learning XOR  166 
62 GradientBased Learning  171 
63 Hidden Units  185 
64 Architecture Design  191 
65 BackPropagation and Other Differentiation Algorithms  197 
66 Historical Notes  217 
7 Regularization for Deep Learning  221 
71 Parameter Norm Penalties  223 
72 Norm Penalties as Constrained Optimization  230 
73 Regularization and UnderConstrained Problems  232 
74 Dataset Augmentation  233 
75 Noise Robustness  235 
76 SemiSupervised Learning  236 
77 Multitask Learning  237 
78 Early Stopping  239 
79 Parameter Tying and Parameter Sharing  246 
710 Sparse Representations  247 
711 Bagging and Other Ensemble Methods  249 
712 Dropout  251 
713 Adversarial Training  261 
714 Tangent Distance Tangent Prop and Manifold Tangent Classifier  263 
8 Optimization for Training Deep Models  267 
81 How Learning Differs from Pure Optimization  268 
82 Challenges in Neural Network Optimization  275 
83 Basic Algorithms  286 
84 Parameter Initialization Strategies  292 
85 Algorithms with Adaptive Learning Rates  298 
86 Approximate SecondOrder Methods  302 
87 Optimization Strategies and MetaAlgorithms  309 
9 Convolutional Networks  321 
91 The Convolution Operation  322 
92 Motivation  324 
93 Pooling  330 
94 Convolution and Pooling as an Infinitely Strong Prior  334 
95 Variants of the Basic Convolution Function  337 
96 Structured Outputs  347 
97 Data Types  348 
98 Efficient Convolution Algorithms  350 
99 Random or Unsupervised Features  351 
910 The Neuroscientific Basis for Convolutional Networks  353 
911 Convolutional Networks and the History of Deep Learning  359 
Recurrent and Recursive Nets  363 
101 Unfolding Computational Graphs  365 
102 Recurrent Neural Networks  368 
103 Bidirectional RNNs  383 
1012 Explicit Memory  405 
11 Practical Methodology  409 
111 Performance Metrics  410 
112 Default Baseline Models  413 
113 Determining Whether to Gather More Data  414 
114 Selecting Hyperparameters  415 
115 Debugging Strategies  424 
MultiDigit Number Recognition  428 
12 Applications  431 
122 Computer Vision  440 
123 Speech Recognition  446 
124 Natural Language Processing  448 
125 Other Applications  465 
III Deep Learning Research  475 
13 Linear Factor Models  479 
131 Probabilistic PCA and Factor Analysis  480 
132 Independent Component Analysis ICA  481 
133 Slow Feature Analysis  484 
134 Sparse Coding  486 
135 Manifold Interpretation of PCA  489 
14 Autoencoders  493 
141 Undercomplete Autoencoders  494 
142 Regularized Autoencoders  495 
143 Representational Power Layer Size and Depth  499 
144 Stochastic Encoders and Decoders  500 
145 Denoising Autoencoders  501 
146 Learning Manifolds with Autoencoders  506 
147 Contractive Autoencoders  510 
148 Predictive Sparse Decomposition  514 
149 Applications of Autoencoders  515 
15 Representation Learning  517 
151 Greedy LayerWise Unsupervised Pretraining  519 
152 Transfer Learning and Domain Adaptation  526 
153 SemiSupervised Disentangling of Causal Factors  532 
154 Distributed Representation  536 
155 Exponential Gains from Depth  543 
156 Providing Clues to Discover Underlying Causes  544 
16 Structured Probabilistic Models for Deep Learning  549 
161 The Challenge of Unstructured Modeling  550 
162 Using Graphs to Describe Model Structure  554 
163 Sampling from Graphical Models  570 
164 Advantages of Structured Modeling  572 
166 Inference and Approximate Inference  573 
167 The Deep Learning Approach to Structured Probabilistic Models  575 
17 Monte Carlo Methods  581 
172 Importance Sampling  583 
173 Markov Chain Monte Carlo Methods  586 
174 Gibbs Sampling  590 
175 The Challenge of Mixing between Separated Modes  591 
18 Confronting the Partition Function  597 
181 The LogLikelihood Gradient  598 
182 Stochastic Maximum Likelihood and Contrastive Divergence  599 
183 Pseudolikelihood  607 
184 Score Matching and Ratio Matching  609 
185 Denoising Score Matching  611 
186 NoiseContrastive Estimation  612 
187 Estimating the Partition Function  614 
19 Approximate Inference  623 
191 Inference as Optimization  624 
192 Expectation Maximization  626 
193 MAP Inference and Sparse Coding  627 
194 Variational Inference and Learning  629 
195 Learned Approximate Inference  642 
20 Deep Generative Models  645 
202 Restricted Boltzmann Machines  647 
203 Deep Belief Networks  651 
204 Deep Boltzmann Machines  654 
205 Boltzmann Machines for RealValued Data  667 
206 Convolutional Boltzmann Machines  673 
207 Boltzmann Machines for Structured or Sequential Outputs  675 
208 Other Boltzmann Machines  677 
209 BackPropagation through Random Operations  678 
2010 Directed Generative Nets  682 
2011 Drawing Samples from Autoencoders  701 
2012 Generative Stochastic Networks  704 
2013 Other Generation Schemes  706 
2014 Evaluating Generative Models  707 
2015 Conclusion  710 
Bibliography  711 
767  