Mutual Information

good_mseInference from high-dimensional data is hard, because the number of instances is usually insufficient with respect to the number of features.  A common solution is feature selection, i.e. to use only a small subset of the features.  Mutual information has been used for years in the feature selection community to select relevant features.  I have studied under which conditions this criterion can be used in classification and regression, showing that mutual information is in general a good choice, even if counterexamples can be obtained.

I also worked on mutual information estimation with the Kozachenko-Leonenko estimator (a.k.a. Kraskov’s estimator), which is based on a nearest neighbours approach.

Related publications