Higher-order interactions in statistical physics and machine learning: A model-independent solution to the inverse problem at equilibrium

Abstract

The problem of inferring pair-wise and higher-order interactions in complex systems involving large numbers of interacting variables, from observational data, is fundamental to many fields. Known to the statistical physics community as the inverse problem, it has become accessible in recent years due to real and simulated ‘big’ data being generated. Current approaches to the inverse problem rely on parametric assumptions, physical approximations, e.g. mean-field theory, and ignoring higher-order interactions which may lead to biased or incorrect estimates. We bypass these shortcomings using a cross-disciplinary approach and demonstrate that none of these assumptions and approximations are necessary: We introduce a universal, model-independent, and fundamentally unbiased estimator of all-order symmetric interactions, via the non-parametric framework of Targeted Learning, a subfield of mathematical statistics. Due to its universality, our definition is readily applicable to any system at equilibrium with binary and categorical variables, be it magnetic spins, nodes in a neural network, or protein networks in biology. Our approach is targeted, not requiring fitting unnecessary parameters. Instead, it expends all data on estimating interactions, hence substantially increasing accuracy. We demonstrate the generality of our technique both analytically and numerically on (i) the 2-dimensional Ising model, (ii) an Ising-like model with 4-point interactions, (iii) the Restricted Boltzmann Machine, and (iv) simulated individual-level human DNA variants and representative traits. The latter demonstrates the applicability of this approach to discover epistatic interactions causal of disease in population biomedicine.

Publication
Physical Review E 102, 053314
Date