This post may be considered an extension of the previous post.
The setup and notation is the same as in the previous post (linked above). But to summarize: Earlier we had an unknown smooth regression function . The idea was to estimate at each training point, the gradient of this unknown function , and then taking the sample expectation of the outerproduct of the gradient. This quantity has some interesting properties and applications.
However it has its limitations, for one, the mapping restricts the Gradient Outer Product being helpful for only regression and binary classification (since for binary classification the problem can be thought of as regression). It is not clear if a similar operator can be constructed when one is dealing with classification, that is the unknown smooth function is a vector valued function where is the number of classes (let us say for the purpose of this discussion, that for each data point we have a probability distribution over the classes, a dimensional vector).
In the case of the gradient outer product since we were working with a real valued function, it was possible to define the gradient at each point, which is simply:
For a vector valued function , we can’t have the gradient, but instead can define the Jacobian at each point:
Note that may be estimated in a similar manner as estimating gradients as in the previous posts. Which leads us to define the quantity .
The first thing to note is that defined in the previous post is simply the quantity for the special case when . Another note is also in order: The reason why we suffixed that quantity with “outer product” (as opposed to “inner product” here) is simply because we considered the gradient to be a column vector, otherwise they are similar in spirit.
Another thing to note is that it is easy to see that the quantity is a positive semi-definite matrix and hence is a Reimannian Metric, which is defined below:
Definition: A Reimannian Metric on a manifold is a symmetric and positive semi-definite matrix, which defines a smoothly varying inner product in the tangent space , for each point and . This associated p.s.d matrix is called the metric tensor. In the above case, since is p.s.d it defines a Reimannian metric:
Thus, is a specific metric (more general metrics are dealt with in areas such as metric learning).
Properties: We saw some properties of in the previous post. In the same vein, does have similar properties? i.e. does the first eigenvector also correspond to the direction of highest average variation? What about the -dimensional subspace? What difference does it make that we are looking at a vector valued function? Also what about the cases when and otherwise?
These are questions that I need to think about and should be the topic for a future post to be made soon, hopefully.