Sobol User Guide

Motivation

Sensitivity analysis is concerned with the degree to which uncertainty in the output of a model can be attributed to uncertainty in its inputs [SRA+08]. Variance based sensitivity analysis, commonly known as sobol sensitivity analysis seeks to answer this question by attributing the variance of the output to variances in one or more inputs. This breakdown is known as a sobol indices and are typically measured in one of two ways: first-order indices and total-effect indices. [Sob01].

The first-order sobol index with respect to some feature is given by averaging the output of the model over all other values of all other features and computing the variance of the result while varying the feature in question. This is normalized by dividing by the total variance of the output measured by varying all feature values [IM93]. Their sum is between 0 and 1. The total-effect index is computed by first computing the variance of the model output with respect to the feature in question, and then computing the expectation of the result over values of all other features. This is again normalized by the variance of the output of the model across all features. These will sum to a number greater than or equal to 1. Both are discussed in more detail here https://en.wikipedia.org/wiki/Variance-based_sensitivity_analysis.

sobol() takes a model and dataset, and runs a monte carlo simulation as described in the above link to compute the first and total order sobol indices. Each index is expressed as a one dimensional array of length equal to the number of features in the supplied data matrix. The model is assumed to be a function that outputs one scalar for each row of the data matrix.

import numpy
from mvtk import sobol

nprng = numpy.random.RandomState(0)

data = nprng.normal(size=(1000, 4)) # 4 features
model = lambda x: (x ** 2).dot([1, 2, 3, 4])
total, first_order = sobol.sobol(model, data, N=500)
[ALG19]

Cem Anil, James Lucas, and Roger Grosse. Sorting out lipschitz function approximation. In International Conference on Machine Learning, 291–301. PMLR, 2019.

[ACB17]

Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.

[BDD+17]

Marc G Bellemare, Ivo Danihelka, Will Dabney, Shakir Mohamed, Balaji Lakshminarayanan, Stephan Hoyer, and Rémi Munos. The cramer distance as a solution to biased wasserstein gradients. arXiv preprint arXiv:1705.10743, 2017.

[CsiszarS+04]

Imre Csiszár, Paul C Shields, and others. Information theory and statistics: a tutorial. Foundations and Trends® in Communications and Information Theory, 1(4):417–528, 2004.

[Dom00]

Pedro Domingos. A unified bias-variance decomposition and its applications. Technical Report, University of Washington, Seattle, WA, January 2000. URL: https://homes.cs.washington.edu/~pedrod/papers/mlc00a.pdf.

[GBR+12]

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773, 2012.

[GAA+17]

Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in neural information processing systems, 5767–5777. 2017.

[Her17]

Vincent Herrmann. Wasserstein gan and the kantorovich-rubinstein duality. February 2017. URL: https://vincentherrmann.github.io/blog/wasserstein/.

[IM93]

Sobol’ IM. Sensitivity estimates for nonlinear mathematical models. Math. Model. Comput. Exp, 1(4):407–414, 1993.

[Lin91]

Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.

[NWJ10]

XuanLong Nguyen, Martin J Wainwright, and Michael I Jordan. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory, 56(11):5847–5861, 2010.

[NCT16]

Sebastian Nowozin, Botond Cseke, and Ryota Tomioka. F-gan: training generative neural samplers using variational divergence minimization. In Advances in neural information processing systems, 271–279. 2016.

[Ras23]

Sebastian Raschka. Bias_variance_decomp: bias-variance decomposition for classification and regression losses. 2014-2023. URL: https://rasbt.github.io/mlxtend/user_guide/evaluate/bias_variance_decomp/.

[SRA+08]

Andrea Saltelli, Marco Ratto, Terry Andres, Francesca Campolongo, Jessica Cariboni, Debora Gatelli, Michaela Saisana, and Stefano Tarantola. Global sensitivity analysis: the primer. John Wiley & Sons, 2008.

[Sob01]

Ilya M Sobol. Global sensitivity indices for nonlinear mathematical models and their monte carlo estimates. Mathematics and computers in simulation, 55(1-3):271–280, 2001.

[SFG+09]

Bharath K Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Bernhard Schölkopf, and Gert RG Lanckriet. On integral probability metrics,\phi-divergences and binary classification. arXiv preprint arXiv:0901.2698, 2009.

[Tro04]

Joel Aaron Tropp. Topics in sparse approximation. PhD thesis, University of Texas at Austin, 2004.

[WHC+16]

Geoffrey I Webb, Roy Hyde, Hong Cao, Hai Long Nguyen, and Francois Petitjean. Characterizing concept drift. Data Mining and Knowledge Discovery, 30(4):964–994, 2016.

[Wu16]

Yihong Wu. Variational representation, hcr and cr lower bounds. February 2016. URL: http://www.stat.yale.edu/~yw562/teaching/598/lec06.pdf.