8 years, 41 days

## A first step through Global Sensitivity ...

Maple

(EDITED 2024/03/11  GMT 17H)

In a recent Question@cq mentionned in its last reply "In fact, I wanted to do parameter sensitivity analysis and get the functional relationship between the [...] function and [parameters]. Later, i will study how the uncertainty of [the parameters] affects the [...] function".
I did not keep exchanging further on with @cq, simply replying that I could provide it more help if needed.

• Let X_1 and X_2 two random variables and G the random variable defined by  G = 1 - (X_1 - 1)^2/9 - (X_2 - 1)^3/16.

•  X_1 and X_2 are assumed to be gaussian random variables with respective mean and standard deviation equal to (theta_1, theta_3) and (theta_2, theta_4).

• The four theta parameters are themselves assumed to be realizations of four mutually independent uniform random variables Theta_1, ..., Theta_4 whose parameters are constants.

• Let QOI  (Quantity Of Interest) denote some scalar statistic of G (for instance its Mean, Variance, Skweness, ...).
For instance, if QOI = Mean(G), then  QOI expresses itself as a function of the four parameters theta_1, ..., theta_4.
The goal of @cq is to understand which of those parameters have the greatest influence on QOI.

For a quick survey of Sensitivity Analysys (SA) and a presentation of some of the most common strategies see Wiki-Overview

The simplest SA is the Local SA (LSA) we are all taught at school: having chosen some reference point P in the [theta_1, ..., theta_4] space the 1st order partial derivatives d[n] = diff(QOI, theta_n) expressed at point P give a "measure" (maybe after some normalization) of the sensitivity, at point P, of QOI regardibg each parameter theta_n.

A more interesting situation occurs when the parameters can take values in a neighorhood of  P which is not infinitesimal, or more generally in some domain without reference to any specific point P.
That is where Global SA (GSA) comes into the picture.
While the notion of local variation at some point P is well established, GSA raises the fundamental question of how to define how to measure the variation of a function over an arbitrary domain?
Let us take a very simple example while trying to answer this question "What is the variation of sin(x) over [0, 2*Pi]?"

1. If we focus on the global trend of sin(x)  mean there is no variation at all.
2. If we consider peak-to-peak amplitude the variation is equal to 2.
3. At last, if we consider L2 norm the variation is equal to Pi.
(but the constant function x -> A/sqrt(2) has the same L2 norm but it is flat, and in some sense les fluctuating).

Statisticians are accustomed to use the concept of variance as a measure to quantify the dispersion of a random variable. At the end of the sixties  one of them, Ilya Meyerovich Sobol’,  introduced the notion of Variance-Based GSA as the key tool to define the global variation of a function. This notion naturally led to that of Sobol' indices as a measure of the sensitivity of a funcion regarding one of its parameters or, which most important, regarding any combination (on says interaction) of its parameters.

The aim of this post is to show how Sobol' indices can be computed when the function under study has an analytic expression.

The Sobol' analysis is based on an additive decomposition of this function in terms of 2^P mutually orthogonal fiunctions where P is the number of its random parameters.
This decomposition and the ensuing integrations whose values will represent the Sobol' indices can be done analytically in some situation. When it is no longer the case specific numerical estimation methods have to be used;

The attached file contains a quite generic procedure to compute exact Sobol' indices and total Sobol' indices for a function whose parameters have any arbitrary statistical distribution.
Let's immediately put this into perspective by saying that these calculations are only possible if Maple is capable to find closed form expressions of some integrals, which is of course not always the case.

A few examples are also provided, including the one corresponding to @cq's original question.
At last two numerical estimation methods are presented.

SOBOL.mw

## Using Markov-like chains to solve a non ...

Maple

Why this post
This work was intended to be a simple reply to a question asked a few days ago.
At some point, I realised that the approach I was using could have a more general interest which, in my opinion, was worth a post.
In a few words, this post is about solving an algebra problem using a method originally designed to tackle statistical problems.

The Context
Recently @raj2018 submitted a question I'm going to resume this way:

Let S(phi ;  beta, f) a function of phi parameterized by beta and f.
Here is the graph of S(phi ;  0.449, 0.19)  @raj2018 provided

@raj2018 then asked how we can find other values (A, B)  of values for (beta, f) such that the graph of S(phi, A, B) has the same aspect of the graph above.
More precisely, let phi_0 the largest strictly negative value of phi such that  S(phi_0, A, B) = 0.
Then  S(phi, A, B) must be negative (strictly negative?) in the open interval (phi_0, 0), and must have exactly 3 extrema in this range.
I will said the point  (A, B) is admissible is S(phi, A, B) verifies thess conditions

The expression of S(phi, A, B) is that complex that it is likely impossible to find an (several?, all?) admissible point using analytic developments.

The approach

When I began thinking to this problem I early thought to find the entire domain of admissible points: was it something possible, at least with some reasonable accuracy?

Quite rapidly I draw an analogy with an other type of problems whose solution is part of my job: the approximate construction of the probability density function (PDF) of multivariate random variables (obviously this implies that no analytical expression of this PDF is available). This is a very classical problem in Bayesian Statistics, for instance when we have to construt an approximation of a posterior PDF.

To stick with this example and put aside the rare situations where this PDF can be derived analytically, building a posterior PDF is largely based on specific numerical methods.
The iconic one is known under the generic name MCMC  which stands for Markov Chain Monte Carlo.

Why am I speaking about MCMC or PDF or even random variables?
Let us consider some multivariate random variable R whose PDF as a constant on some bounded domain D and is equal to 0 elsewhere. R is then a uniform random variable with support supp(R) = D.
Assuming the domain Adm of admissible (beta, f) is bounded, we may  think of it as the support of some uniform random variable. Following this analogy we may expect to use some MCMC method to "build the PDF of the bivariate random variable (beta, f)", otherwise stated "to capture​​​​​​ the boundary of​ Adm".

The simplest MCMC method is the Metropolis-Hastings algorithm (MH).
In a few simple words MH builds a Markov chain this way:

1. Let us assume that the chain already contains elements e1, ..., en.
Let  f  some suitable "fitness" function (whose nature is of no importance right now).
2. A potential new element c is randomly picked in some neighborhood or en.
3. If the ratio (c) / (en) is larger than 1, we decide to put c into the chain (thus en+1 = c) otherwise we leave it to chance to decide whether or not c iis put into the chain.
If chance decides the contrary,  then en is duclicated (thus en+1 = en).

MH is not the most efficient MCMC algorithm but it is efficient enough for what we want to achieve.
The main difficulty here is that there is no natural way to build the fitness function  f , mainly because the equivalent random variable I talked about is a purely abstract construction.

A preliminary observation is that if S(phi, beta, f) < 0 whatever phi in (phi_0, 0), then S has an odd number of extrema in (phi_0, 0). The simplest way to find these extrema is to search for the zeros of the derivative S' of S with respect to phi, while discardinq those where the second derivative can reveal "false" extrema where both S'' of S is null (I emphasize this last point because I didn't account for it in attached file).
The algorithm designed in this file probably misses a few points for not checking if S''=0, but it is important to keep in mind that we don't want a complete identification of  Adm but just the capture of its boundary.
Unless we are extremely unlucky there is only a very small chance that omitting to check if S''=0 will deeply modify this boundary.

How to define function f  ?
What we want is that  f (c) / (en) represents the probability to decide wether c is an admissible point or not. In a Markov chain this  ratio represents how better or worse c is relatively to en, and this is essential for the chain to be a true Markov chain.
But as our aim is not to build a true Markov chain but simply a chain which looks like a Markov chain, we we can take some liberties and replace  f (c) / (en) by some function  g(c) which quantifies the propability for c to be an admissible couple. So we want that  g(c) = 1 if  S(phi, c) has exactly M=3 negative extrema and  g(c) < 1 if M <> 3.
The previous algorihm transforms into:

1. Let us assume that the chain already contains elements e1, ..., en.
Let  g  a function which the propability that element is admissible
2. A potential new element c is randomly picked in some neighborhood or en.
3. If the ratio g(c) is larger than 1, we decide to put c into the chain (thus en+1 = c) otherwise we leave it to chance to decide whether or not c iis put into the chain.
If chance decides the contrary,  then en is duclicated (thus en+1 = en).

This algorithm can also be seen as a kind of genetic algorithm.

A possible choice is  g(c)= exp(-|3-M|).
In the attached file I use instead the expression g(c) = (M + 1) / 4 fo several reasons:

• It is less sharp at M=3 and thus enables more often to put c into the chain, which increases its exploratory capabilities.
• The case M > 3, which no preliminary investigation was able to uncover, is by construction eliminated in the procedure Extrema which use an early stopping strategy (if as soon as more than M=3 negative extrema are found the procedure stops).

The algorithm I designed basically relies upon two stages:

1. The first one is aimed to construct a "long" Markov-like chain ("long" and not long because Markov chains are usually much longer than those I use).
There are two goals here:
1. Check if Adm is or not simply-connected or not (if it has holes or not).
2. Find a first set of admissible points that can be used as starting points for subsequent chains.

2. Run several independent Markov-like chains from a reduced set of admissible points.
The way this reduced set is constructed depends on the goal to achieve:
1. One may think of adding points among those already known in order to assess the connectivity of Adm,
2. or refinining the boundary of Adm.

These two concurent objectives are mixed in an ad hoc way depending on the observation of the results already in hand.

We point here an important feature of MCMC methods: behind their apparent algorithmic simplicity, it is common that high-quality results can only be obtained efficiently at the cost of problem-dependent tuning.

A last word to say that after several trials and failures I found it simpler to reparameterize the problems in terms of (phi_0, f) instead of (beta, f).

Codes and results

Choice g(c) = (M + 1) / 4
The code : Extrema_and_MCMC.mw

To access the full results I got load this m file (do not bother its extension, Mapleprimes doesn't enable uploading m files) MCMC_20231209_160249.mw (save it and change it's extension in to m instead mw)

EDITED: choice  g(c)= exp(-|3-M|)
Here are the files contzining the code and the results:
Extrema_and_MCMC_g2.mw
MCMC_20231211_084053.mw

To ease the comparison of the two sets of results I used the same random seeds inn both codes.
Comparing the results got around the first admissible point is straightforward.
It's more complex for @raj2018's solution because the first step of the algorithim (drawing of a sibgle chain of length 1000) finds six times more admissible point with g(c)= exp(-|3-M|) than with g(c) = (M + 1) / 4.

## How to build a multivariate random varia...

Maple

For years I've been angry that Maple isn't capable of formally manipulating random vectors (aka multivariate random variables).
For the record Mathematica does.

The problem I'm concerned with is to create a vector W such that

`type(W, RandomVariable)`

will return true.
Of course defining W from its components w1, .., wN, where each w is a random variable is easy, even if these components are correlated or, more generally dependent ( the two concepts being equivalent iif all the w are gaussian random variables).
But one looses the property that W is no longer a (multivariate) random variable.
See a simple example here: NoRandomVectorsInMaple.mw

This is the reason why I've developped among years several pieces of code to build a few multivariate random variable (multinormal, Dirichlet, Logistic-Normal, Skew Multivariate Normal, ...).

In the framework of my activities, they are of great interest and the purpose of this post is to share what I have done on this subject by presenting the most classic example: the multivariate gaussian random variable.

My leading idea was (is) to build a package named MVStatistics on the image of the Statistics package but devoted to Multi Variate random variables.
I have already construct such a package aggregating about fifty different procedures. But this latter doesn't merit the appellation of "Maple package" because I'm not qualified to write something like this which would be at the same time perennial, robust, documented, open and conflict-free with the  Statistics package.
In case any of you are interested in pursuing this work (because I'm about to change jobs), I can provide it all the different procedures I built to construct and manipulate multivariate random variables.

To help you understand the principles I used, here is the most iconic example of a multivariate gaussian random variable.
The attached file contains the following procedures

```MVNormal
Constructs a gaussian random vector whose components can be mutually correlated
The statistics defined in Distribution are: (this list could be extended to other
statistics, provided they are "recognized" statitics, see at the end of this
post):
PDF
Mode
Mean
Variance
RandomSample

DispersionEllipse
Builds and draws the dispersion ellipses of a bivariate gaussia, random vector

DispersionEllipsoid
Builds and draws the dispersion ellipsoids of a trivariate gaussia, random vector

MVstat
Computes several statistics of a random vector (Mean, Variance, ...)

Iserlis
Computes the moments of any order of a gaussian random vector

MVCentralMoment
Computes the central moments of a gaussian random vector

Conditional
Builds the conditional random vector of a gaussian random vector wrt some of its components
the moments of any order of a gaussian random vector.
Note: the result has type RandomVariable.

MarginalizeAgainst
Builds the marginal random vector of a gaussian random vector wrt some of its components
the moments of any order of a gaussian random vector.
Note: the result has type RandomVariable.

MardiaNormalityTest
The multi-dimensional analogue of the Shapiro-Wilks normality test

HZNormalityTest
Henze-Zirkler test for Multivariate Normality

MVWaldWolfowitzTest
A multivariate version of the non-parametrix Wald-Folfowitz test
```

Do not hesitate to ask me any questions that might come to mind.
In particular, as Maple introduces limitations on the type of some attributes (for instance Mean  must be of algebraic type), I've been forced to lure it by transforming vector or matrix quantities into algebraic ones.
An example is

`Mean = add(m[k]*x[k], k=1..K)`

where m[k] is the expectation of the kth component of this random vector.
This implies using the procedure MVstat to "decode", for instance, what Mean returns and write it as a vector.

MultivariateNormal.mw

About the  statistics ths Statistics:-Distribution constructor recognizes:
To get them one can do this (the Normal distribution seems to be the continuous one with the most exhaustive list os statistics):

```restart
with(Statistics):
X := RandomVariable(Normal(a, b)):
attributes(X);
protected, RandomVariable, _ProbabilityDistribution

map(e -> printf("%a\n", e), [exports(attributes(X)[3])]):
Conditions
ParentName
Parameters
CharacteristicFunction
CDF
CGF
HodgesLehmann
Mean
Median
MGF
Mode
PDF
RousseeuwCrouxSn
StandardDeviation
Support
Variance
CDFNumeric
QuantileNumeric
RandomSample
RandomSampleSetup
RandomVariate
MaximumLikelihoodEstimate
```

Unfortunately it happens that for some unknown reason a few statistics cannot be set by the user.
This is for instance the case of Parameters serious consequences in certain situations.
Among the other statistics that cannot be set by the user one finds:

• ParentName,
• QuantileNumeric  whose role is not very clear, at least for me, but which I suspect is a procedure which "inverts" the CDF to give a numerical estimation of a quantile given its probability.
If it is so accessing  QuantileNumeric would be of great interest for distributions whose the quantiles have no closed form expressions.
• CDFNumeric  (same remark as above)

Finally, the statistics Conditions, which enables defining the conditions the elements of Parameters must verify are not at all suited for multivariate random variables.
It is for instance impossible to declare that the variance matrix (or the correlation matrix) is a square symmetric positive definite matrix).

## An existential question...

MaplePrimes

What do you think is the acceptable limit to the effort required to answer a question?

At what point does the question-and-answer game between two contributors become unreasonable?

How do you, the most highly ranked, deal with situations that last for days?

## Solving a least-squares problem with QR ...

Maple

This post is motivated by a question asked by @vs140580  ( The program is making intercept zero even though There is a intercept in regression Fit (A toy code showing the error attached) ).

The problem met by @vs140580 comes from the large magnitudes of the (two) regressors and the failure to Fit/LinearFit to find the correct solution unless an undecent value of Digits is used.
This problem has been answerd by @dharr after scaling the data (which is always, when possible, a good practice) and by
myself while using explicitely the method called "Normal Equations" (see https://en.wikipedia.org/wiki/Least_squares).

The method of "Normal Equations" relies upon the inversion of a symmetric square matrix H whose dimension is equal to the number of coefficients of the model to fit.
It's well known that this method can potentially lead to matrices H extremely ill-conditionned, and thus face severe numerical problems (the most common situation being the fit of a high degree polynomial).

• In English: http://www.math.kent.edu/~reichel/courses/intr.num.comp.1/fall09/lecture4/lecture4.pdf
• In French: https://moodle.utc.fr/pluginfile.php/24407/mod_resource/content/5/MT09-ch3_ecran.pdfI

The attached file illustrates how the QR decomposition method works.
The test case is @vs140580's.

Maybe the development team could enhance Fit/LinearFit in future versions by adding an option which specifies what method is to be used?

 > restart:
 > with(Statistics):
 > interface(version)
 (1)
 > Data := Matrix([[4.74593554708566, 11385427.62, 2735660038000], [4.58252830679671, 25469809.77, 12833885700000], [4.29311160501838, 1079325200, 11411813200000000], [4.24176959154225, 1428647556, 18918585950000000], [5.17263072694618, 1428647556, 18918585950000000], [4.39351114955735, 1877950416, 30746202150000000], [4.39599006758777, 1428647556, 18918585950000000], [5.79317412396815, 2448320309, 49065217290000000], [4.48293612651735, 2448320309, 49065217290000000], [4.19990181982522, 2448320309, 49065217290000000], [5.73518217699046, 1856333905, 30648714900000000], [4.67943831980476, 3071210420, 75995866910000000], [4.215240105336, 2390089264, 48670072110000000], [4.41566877563247, 3049877383, 75854074610000000], [4.77780395369828, 2910469403, 74061327950000000], [4.96617430604669, 1416936352, 18891734280000000], [4.36131111330988, 1416936352, 18891734280000000], [5.17783192063198, 1079325200, 11411813200000000], [4.998266287191, 1067513353, 11402362980000000], [4.23366152474871, 2389517120, 48661380410000000], [4.58252830679671, 758079709.3, 5636151969000000], [6.82390874094432, 1304393838, 14240754750000000], [4.24176959154225, 912963601.2, 8621914602000000], [4.52432881167557, 573965555.4, 3535351888000000], [4.84133601918601, 573965555.4, 3535351888000000], [6.88605664769316, 732571773.2, 5558875538000000], [4.35575841415627, 1203944381, 13430693320000000], [4.42527441640593, 955277678, 8795128298000000], [6.82390874094432, 997591754.9, 8968341995000000], [4.35144484433733, 143039477.1, 305355143300000]]):
 > # Direct use of LinearFit. # # As far as I know LinearFit is based on the resolution of the "Normal Equations" # (see further down), a system of equations that is known to be ill-conditioned # when regressors have large values (in particular when polynomial regression # is used). X := Data[.., [2, 3]]: Y := Data[.., 1]: LinearFit(C1+C2*v+C3*w, X, Y, [v, w]);
 (2)
 > # For roundoff issues the 3-by-3 matrix involved in the "Normal Equations" (NE) # appears to of rank < 3. # The rank of this matrix is rqual to 1+rank(X) and one can easily verify that # the 2 columns of X are linearly independent: LinearAlgebra:-LinearSolve(X, Vector(numelems(Y), 0)); LinearAlgebra:-Rank(X);
 (3)
 > # Solve the least squares problem by using explicitely the NE. # # To account for an intercept we augment X by a vector column of "1" # traditionally put in column one. Z := `<|>`(Vector(numelems(Y), 1), X):   A := (Z^+ . Z)^(-1) . Z^+ . Y;          # Normal Equations
 (4)
 > # What is the rank of Z? # Due to the scale of compared to "1", Rank fails to return the good value # of rank(Z), which is obviously equal to rank(X)+1. LinearAlgebra:-LinearSolve(Z, Vector(numelems(Y), 0)); LinearAlgebra:-Rank(Z);
 (5)

A WORKAROUND : SCALING THE DATA

 > model := unapply( LinearFit(C1+C2*v+C3*w, Scale(X), Scale(Y), [v, w]), [v, w] );
 (6)
 > mX, sX := (Mean, StandardDeviation)(X); mY, sY := (Mean, StandardDeviation)(Y);
 (7)
 > MODEL := model((x1-mX[1])/sX[1], (x2-mX[2])/sX[2]) * sY + mY
 (8)
 > # Check that the vector of regression coefficients is almost equal to A found above # relative error lesst than 10^(-14) A_from_scaling       := < coeffs(MODEL) >: Relative_Discrepancy := (A_from_scaling - A) /~ A
 (9)

THE QR DECOMPOSITION  (applied on raw data)

The QR decomposition, as well as Given's rotation method, are two alternatives to the the NE method
to find the vector of regression coefficients.
Both of them are known to be less sensitive to the magnitudes of the regressors and do nt require (not
always) a scaling of the data (which can be quite complex with polynomial regression or when some
transformation is used to liearize the statistical model, for instanc Y=a*exp(b*X) --> log(Y)=log(a)+b*X).

 > N := numelems(Y); P := numelems(Z[1]);
 (10)
 > # Perform the QR decomposition of Z. Q, R := LinearAlgebra:-QRDecomposition(Z, fullspan);
 (11)
 > # Let C the column vector of length P defined by: C := (Q^+ . Y)[1..P];
 (12)
 > # Then the vector of regression coefficients is given by: A_QR                 := (R[1..P, 1..P])^(-1) . C; Relative_Discrepancy := (A_QR - A) /~ A
 (13)
 > # The matrix H = Z^+ . Z writes H                    := Z^+ . Z: H_QR                 := R^+ . Q^+ . Q . R: Relative_Discrepancy := (H_QR - H) /~ H
 (14)
 > # H_QR expression is required to obtain the covariance matrix of the regression coefficients.