Here are some more notes from my recent reading. The following are some key challenges in machine learning which motivate the solutions offered by deep learning.

## Curse of Dimensionality

When a function consists of many variables, then it’s possible the combinatorial set of possibilities (regions) is much larger than the training set. Many ML algorithms rely upon at least a statistically significant number of training points for each possibility (why? see next…) However, this assumption is not always valid for many data sets.

## Assuming Smoothness and Local Constancy

“What is the best forecast for weather tomorrow?” Well, the weather today, but just a bit different.

Basically, function smoothness and local constancy assumes that test points will return results “near to” similar points in the training set. ML algorithms such as k-nearest neighbors and decision trees rely on this assumption.

A key finding has been that a large number of regions can be defined with a much smaller number of examples if dependencies between regions are introduced based on the underlying data generating function. In particular, one assumes the data was generated by a composition of functions, possibly in a hierarchy.

Hmm, so is this why they call it “deep”?

## Manifold Learning

Further reduces the solution space from a combinatorially huge set of possibilities, by observing that most data sets (particularly NLP) have more likely paths between adjoining points. Perhaps like a conditional probability for predicting the next word given the known words (context)?

Treating the most likely paths as a lower-dimension manifold in the higher-dimension solution space allows for much more efficient algorithms and a means to tackle larger and more challenging problems.

From from Chapter 5.11 of Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville.