########### Interprenet User Guide ########### ********** Motivation ********** Neural networks are generally difficult to interpret. While there are tools that can help to interpret certain types of neural networks such as image classifiers and language models, interpretation of neural networks that simply ingest tabular data and return a scalar value is generally limited to various measures of feature importance. This can be problematic as what makes a feature "important" can vary between use cases. Rather than interpret a neural network as a black box, we seek to constrain neural network in ways we consider useful and interpretable. In particular, The interprenet module currently has two such constraints implemented: * Monotonicity * Lipschitz constraint `Monotonic functions `_ either always increase or decrease with their arguments but never both. This is often an expected relationship between features and the model output. For example, we may believe that increasing blood pressure increases risk of cardiovascular disease. The exact relationship is not known, but we may believe that it is monotonic. `Lipschitz constraints `_ constrain the maximum rate of change of the model. This can make the model arbitrarily robust `against adversarial perturbations `_ :cite:`anil2019sorting`. How? ==== All constraints are currently implemented as weight constraints. While arbitrary weights are stored within each linear layer, the weights are transformed before application so the network can satisfy is prescribed constraints. Changes are backpropagated through this transformation. Monotonic increasing neural networks are implemented by taking the absolute value of weight matrices before applying them. When paired with a monotonically increasing activation (such as ReLU, Sigmoid, or Tanh), this ensures the gradient of the output with respect to any features is positive. This is sufficient to ensure monotonicity with respect to the features. Lipschitz constraints are enforced by dividing each weight vector by its :math:`L^\infty` norm as described in :cite:`anil2019sorting`. This constrains the :math:`L^\infty`-:math:`L^\infty` `operator norm `_ of the weight matrix :cite:`tropp2004topics`. Constraining the :math:`L^\infty`-:math:`L^\infty` operator norm of the weight matrix ensures every element of the jacobian of the linear layers is less than or equal to :math:`1`. Meanwhile, using activation functions with Lipschitz constants of :math:`1` ensure the entire network is constrained to never have a slope greater than :math:`1` for any of its features. ********** Different Constraints on Different Features ********** .. currentmodule:: mvtk.interprenet :meth:`constrained_model` generates a neural network with one set of constraints per feature. Constraints currently available are: - :meth:`identity` (for no constraint) - :meth:`monotonic_constraint` - :meth:`lipschitz_constraint` Features are grouped by the set of constraints applied to them, and different constrained neural networks are generated for each group of features. The outputs of those neural networks are concatenated and fed into a final neural network constrained using all constraints applied to all features. Since constraints on weight matrices compose, they can be applied as a series of transformations on the weights before application. .. figure:: images/interprenet.png :width: 500px :align: center :height: 400px :alt: alternate text :figclass: align-center 4 features with Lipschitz constraints and 4 features wtih monotonic constraints are fed to their respectively constrained neural networks. Intermediate outputs are concatenated and fed into a neural network with monotonic and lipschitz constraints. We use the Sort function as a nonlinear activation as described in :cite:`anil2019sorting`. The jacobian of this matrix is always a permutation matrix, which retains any Lipschitz and monotonicity constraints. ********** Preprocessing ********** Thus far, we have left out two important detail: How to constrain the Lipschitz constant to be something other than :math:`1`, and how to create monotonically decreasing networks. Both are a simple matter of preprocessing. The ``preprocess`` argument (defaulting to ``identity``), specifies a function to be applied to the feature vector before passing it to the neural network. For decreasing monotonic constraints, multiply the respective features by :math:`-1`. For a Lipschitz constant of :math:`L`, multiply the respective features by :math:`L`. .. topic:: Tutorials: * :doc:`Interprenet ` .. bibliography:: refs.bib :cited: