mmcdara

6563 Reputation

18 Badges

8 years, 101 days

MaplePrimes Activity


These are Posts that have been published by mmcdara


This post is inspired by minhthien2016's question.

The problem, denoted 2/N/1, for reasons that will appear clearly further on, is to pack N disks into the unit square in such a way that the sum of their radii is maximum.

I replied this problem using Optimization-NLPSolve for N from 1 (obvious solution) to 16, which motivated a few questions, in particular:

  • @Carl Love: "Can we confirm that the maxima are global (NLPSolve tends to return local optima)?
    Using NLPSolve indeed does not guarantee that the solution found is the (a?) global maximum. In fact packing problems are generaly tackled by using evolutionnary algorithms, greedy algorithms, or specific heuristic strategies.
    Nevertheless, running NLPSolve a large number of times from different initial points may provide several different optima whose the largest one can be reasonably considered as the global maximum you are looking for.
    Of course this may come to a large price in term of computational time.

     
  • @acer: "How tight are [some constraints], at different N? Are they always equality?"
    The fact some inequality constraints type always end to equality constraints (which means that in an optimal packing each disk touches at least one other annd, possibly the boundary of the box) seems hard to prove mathematically, but I gave here a sketch of informal proof.



I found 2/N/1 funny enough to spend some time digging into it and looking to some generalizations I will refer to as D/N/M:  How to pack N D-hypersheres into the unit D-hypercube such that the sum of the M-th power of their radii is maximum?
For the sake of simplicity I will say ball instead of disk/sphere/hypersphere and box instead of square/cube/hypercube.

The first point is that problems D/N/1 do not have a unique solution as soon as N > 1 , indeed any solution can be transformed into another one using symmetries with respect to medians and diagonals of the box. Hereafter I use this convention:

Two solutions and s' are fundamental solutions if:

  1. the ordered lists of radii and s'  contain are identical but there is no composition of symmetries from to s',
  2. or, the ordered lists of radii and s'  contain are not identical.
     

It is easy to prove that 2/2/1 and 3/2/1, and likely D/2/1, have an infinity of fundamental solutions: see directory FOCUS__2|2|1_and_3|2|1 in the attached zip file..
At the same time 2/N/2, 3/N/3, and likely D/N/D, have only one fundamental solution (see directory FOCUS__2|N|2 for more details and a simple way to characterize these solutions

 (Indeed the strategy ito find the solution of D/N/D  in placing the biggest possible ball in the largest void D/N-1/D contains. Unfortunately this characterization is not algorithmically constructive in the sense that findind this biggest void is a very complex geometrical and combinatorial problem.
 it requires finding the largest void  in a pack of balls)


Let Md, 1(N)  the maximum value of the sum of balls radii for problem d/N/1.
The first question I asked myself is: How does Md, 1(N) grows with N?

 

(Md, 1(N) is obviously a strictly increasing function of N: indeed the solution of problem d/N/1 contains several voids where a ball of strictly positive radius can be placed, then  Md+1, 1(N) > Md, 1(N) )


The answer seems amazing as intensive numerical computations suggest that
                                      

See D|N|M__Growth_law.mw in the attached sip file.
This formula fits very well the set of points  { [n, Sd, 1(n) , n=1..48) } for d=2..6.
I have the feeling that this conjecture might be proven (rejected?) by rigourous mathematical arguments.


Fundamental solutions raise several open problems:

  • Are D/2/1 problems the only one with more than one fundamental solutions?

    The truth is that I have not been capable to find any other example (which does not mean they do not exist).
    A quite strange thing is the behaviour of NLPSolve: as all the solutions of D/2/1 are equally likely, the histogram of the solutions provided by a large number of NLPSolve runs from different initial points is far from being uniform.
    F
    or more detail refer ro directory FOCUS__2|2|1_and_3|2|1
     in the attached zip file
    I do not understand where this bias comes from: is it due to the implementation of SQP in NLPSolve, or to SQP itself?

     
  • For some couples (D, N) the solution of D/N/1 is made of balls of same radius.
    For N from 1 to 48 this is (numerically)
     the case for 2/1/1 and2/2/1, but the three dimensional case is reacher as it contains  3/1/13/2/1,  3/3/1,  3/4/1 and 3/14/1 (this latter being quite surprising).
    Is there only a finite number of values 
    N such that D/N/1 is made of balls with identical radii?
    If it is so, is this number increasing with
     D?
    It is worth noting that those values of
    N mean that the solution of problems D/N/1 are identical to those of a more classic packing problem: "What is the largest radius N identical balls packed in a unit bow may have?".
    For an exhaustive survey of this latter problem see
    Packomania.

     
  • A related question is "How does the number of different radii evolves as N increases dor given values of D and M?
    Displays of 2D and 3D packings show that the set of radii has significantly less elements than
    N... at least for values of N not too large. So might we expect that solution of, let us say, 2/100/1 can contain 100 balls of 10 different radii, or it is more reasonable to expect it contains 100 balls of 100 different radii?

     
  • At the opposite numerical investigations of  2/N/1 and  3/N/1 suggest that the number of different radii a fundamental solution contains increases with N (more a trend than a continuous growth).
    So, is it true that very large values of N correspond to solutions where the number of different radii is also very large?

    Or could it be that the growth of the number of different radii I observed is simply the consequence of partially converged results?
     
  • Numerical investigations show that for a given dimension d and a given number of balls n,  solutions of d/n/1 and d/n/M (1 < M < d) problems are rather often the same. Is this a rule with a few exceptions or a false impression due to the fact that I did not pushed the simulations to values of n large enough to draw a more solid picture)?


It is likely that some of the questions above could be adressed by using a more powerful machine than mine.


All the codes and results are gathered in  a zip file you can download from OneDrive Google  (link at the end of this post, 262 Mb, 570 Mb when unzipped, 1119 files).
Install this zip file in the directory of your choice and unzip it to get a directory named
PACKING
Within it:

  • README.mw contains a description of the different codes and directories
  • Repository.rtf must contain a string repesenting the absolute path of directory PACKING


Follow this link OneDrive Google



(EDITED 2024/03/11  GMT 17H)

In a recent Question@cq mentionned in its last reply "In fact, I wanted to do parameter sensitivity analysis and get the functional relationship between the [...] function and [parameters]. Later, i will study how the uncertainty of [the parameters] affects the [...] function".
I did not keep exchanging further on with @cq, simply replying that I could provide it more help if needed.

In a few words the initial problem was this one:

  • Let X_1 and X_2 two random variables and G the random variable defined by  G = 1 - (X_1 - 1)^2/9 - (X_2 - 1)^3/16.
     
  •  X_1 and X_2 are assumed to be gaussian random variables with respective mean and standard deviation equal to (theta_1, theta_3) and (theta_2, theta_4).
     
  • The four theta parameters are themselves assumed to be realizations of four mutually independent uniform random variables Theta_1, ..., Theta_4 whose parameters are constants.
     
  • Let QOI  (Quantity Of Interest) denote some scalar statistic of G (for instance its Mean, Variance, Skweness, ...).
    For instance, if QOI = Mean(G), then  QOI expresses itself as a function of the four parameters theta_1, ..., theta_4.
    The goal of @cq is to understand which of those parameters have the greatest influence on QOI.


For a quick survey of Sensitivity Analysys (SA) and a presentation of some of the most common strategies see Wiki-Overview


The simplest SA is the Local SA (LSA) we are all taught at school: having chosen some reference point P in the [theta_1, ..., theta_4] space the 1st order partial derivatives d[n] = diff(QOI, theta_n) expressed at point P give a "measure" (maybe after some normalization) of the sensitivity, at point P, of QOI regardibg each parameter theta_n.


A more interesting situation occurs when the parameters can take values in a neighorhood of  P which is not infinitesimal, or more generally in some domain without reference to any specific point P.
That is where Global SA (GSA) comes into the picture.
While the notion of local variation at some point P is well established, GSA raises the fundamental question of how to define how to measure the variation of a function over an arbitrary domain?
Let us take a very simple example while trying to answer this question "What is the variation of sin(x) over [0, 2*Pi]?"

  1. If we focus on the global trend of sin(x)  mean there is no variation at all.
  2. If we consider peak-to-peak amplitude the variation is equal to 2.
  3. At last, if we consider L2 norm the variation is equal to Pi.
    (but the constant function x -> A/sqrt(2) has the same L2 norm but it is flat, and in some sense les fluctuating). 


Statisticians are accustomed to use the concept of variance as a measure to quantify the dispersion of a random variable. At the end of the sixties  one of them, Ilya Meyerovich Sobol’,  introduced the notion of Variance-Based GSA as the key tool to define the global variation of a function. This notion naturally led to that of Sobol' indices as a measure of the sensitivity of a funcion regarding one of its parameters or, which most important, regarding any combination (on says interaction) of its parameters.

The aim of this post is to show how Sobol' indices can be computed when the function under study has an analytic expression.
 

The Sobol' analysis is based on an additive decomposition of this function in terms of 2^P mutually orthogonal fiunctions where P is the number of its random parameters.
This decomposition and the ensuing integrations whose values will represent the Sobol' indices can be done analytically in some situation. When it is no longer the case specific numerical estimation methods have to be used;


The attached file contains a quite generic procedure to compute exact Sobol' indices and total Sobol' indices for a function whose parameters have any arbitrary statistical distribution.
Let's immediately put this into perspective by saying that these calculations are only possible if Maple is capable to find closed form expressions of some integrals, which is of course not always the case.

A few examples are also provided, including the one corresponding to @cq's original question.
At last two numerical estimation methods are presented.

SOBOL.mw

 

Why this post
This work was intended to be a simple reply to a question asked a few days ago.
At some point, I realised that the approach I was using could have a more general interest which, in my opinion, was worth a post.
In a few words, this post is about solving an algebra problem using a method originally designed to tackle statistical problems.

The Context
Recently @raj2018 submitted a question I'm going to resume this way:

Let S(phi ;  beta, f) a function of phi parameterized by beta and f.
Here is the graph of S(phi ;  0.449, 0.19)  @raj2018 provided

@raj2018 then asked how we can find other values (A, B)  of values for (beta, f) such that the graph of S(phi, A, B) has the same aspect of the graph above.
More precisely, let phi_0 the largest strictly negative value of phi such that  S(phi_0, A, B) = 0.
Then  S(phi, A, B) must be negative (strictly negative?) in the open interval (phi_0, 0), and must have exactly 3 extrema in this range.
I will said the point  (A, B) is admissible is S(phi, A, B) verifies thess conditions

The expression of S(phi, A, B) is that complex that it is likely impossible to find an (several?, all?) admissible point using analytic developments.

The approach

When I began thinking to this problem I early thought to find the entire domain of admissible points: was it something possible, at least with some reasonable accuracy? 

Quite rapidly I draw an analogy with an other type of problems whose solution is part of my job: the approximate construction of the probability density function (PDF) of multivariate random variables (obviously this implies that no analytical expression of this PDF is available). This is a very classical problem in Bayesian Statistics, for instance when we have to construt an approximation of a posterior PDF.

To stick with this example and put aside the rare situations where this PDF can be derived analytically, building a posterior PDF is largely based on specific numerical methods. 
The iconic one is known under the generic name MCMC  which stands for Markov Chain Monte Carlo.

Why am I speaking about MCMC or PDF or even random variables?
Let us consider some multivariate random variable R whose PDF as a constant on some bounded domain D and is equal to 0 elsewhere. R is then a uniform random variable with support supp(R) = D.
Assuming the domain Adm of admissible (beta, f) is bounded, we may  think of it as the support of some uniform random variable. Following this analogy we may expect to use some MCMC method to "build the PDF of the bivariate random variable (beta, f)", otherwise stated "to capture​​​​​​ the boundary of​ Adm".

The simplest MCMC method is the Metropolis-Hastings algorithm (MH).
In a few simple words MH builds a Markov chain this way:

  1. Let us assume that the chain already contains elements e1, ..., en.
    Let  f  some suitable "fitness" function (whose nature is of no importance right now).
  2. A potential new element c is randomly picked in some neighborhood or en.
  3. If the ratio (c) / (en) is larger than 1, we decide to put c into the chain (thus en+1 = c) otherwise we leave it to chance to decide whether or not c iis put into the chain.
    If chance decides the contrary,  then en is duclicated (thus en+1 = en).


MH is not the most efficient MCMC algorithm but it is efficient enough for what we want to achieve.
The main difficulty here is that there is no natural way to build the fitness function  f , mainly because the equivalent random variable I talked about is a purely abstract construction.

A preliminary observation is that if S(phi, beta, f) < 0 whatever phi in (phi_0, 0), then S has an odd number of extrema in (phi_0, 0). The simplest way to find these extrema is to search for the zeros of the derivative S' of S with respect to phi, while discardinq those where the second derivative can reveal "false" extrema where both S'' of S is null (I emphasize this last point because I didn't account for it in attached file).
The algorithm designed in this file probably misses a few points for not checking if S''=0, but it is important to keep in mind that we don't want a complete identification of  Adm but just the capture of its boundary.
Unless we are extremely unlucky there is only a very small chance that omitting to check if S''=0 will deeply modify this boundary.


How to define function f  ?
What we want is that  f (c) / (en) represents the probability to decide wether c is an admissible point or not. In a Markov chain this  ratio represents how better or worse c is relatively to en, and this is essential for the chain to be a true Markov chain.
But as our aim is not to build a true Markov chain but simply a chain which looks like a Markov chain, we we can take some liberties and replace  f (c) / (en) by some function  g(c) which quantifies the propability for c to be an admissible couple. So we want that  g(c) = 1 if  S(phi, c) has exactly M=3 negative extrema and  g(c) < 1 if M <> 3.
The previous algorihm transforms into:

  1. Let us assume that the chain already contains elements e1, ..., en.
    Let  g  a function which the propability that element is admissible
  2. A potential new element c is randomly picked in some neighborhood or en.
  3. If the ratio g(c) is larger than 1, we decide to put c into the chain (thus en+1 = c) otherwise we leave it to chance to decide whether or not c iis put into the chain.
    If chance decides the contrary,  then en is duclicated (thus en+1 = en).

This algorithm can also be seen as a kind of genetic algorithm.

A possible choice is  g(c)= exp(-|3-M|).
In the attached file I use instead the expression g(c) = (M + 1) / 4 fo several reasons:

  • It is less sharp at M=3 and thus enables more often to put c into the chain, which increases its exploratory capabilities.
  • The case M > 3, which no preliminary investigation was able to uncover, is by construction eliminated in the procedure Extrema which use an early stopping strategy (if as soon as more than M=3 negative extrema are found the procedure stops).


The algorithm I designed basically relies upon two stages:

  1. The first one is aimed to construct a "long" Markov-like chain ("long" and not long because Markov chains are usually much longer than those I use).
    There are two goals here:
    1. Check if Adm is or not simply-connected or not (if it has holes or not).
    2. Find a first set of admissible points that can be used as starting points for subsequent chains.
       
  2. Run several independent Markov-like chains from a reduced set of admissible points.
    The way this reduced set is constructed depends on the goal to achieve:
    1. One may think of adding points among those already known in order to assess the connectivity of Adm,
    2. or refinining the boundary of Adm.

These two concurent objectives are mixed in an ad hoc way depending on the observation of the results already in hand.


We point here an important feature of MCMC methods: behind their apparent algorithmic simplicity, it is common that high-quality results can only be obtained efficiently at the cost of problem-dependent tuning.

A last word to say that after several trials and failures I found it simpler to reparameterize the problems in terms of (phi_0, f) instead of (beta, f).

Codes and results

Choice g(c) = (M + 1) / 4 
The code : Extrema_and_MCMC.mw

To access the full results I got load this m file (do not bother its extension, Mapleprimes doesn't enable uploading m files) MCMC_20231209_160249.mw (save it and change it's extension in to m instead mw)

EDITED: choice  g(c)= exp(-|3-M|)
Here are the files contzining the code and the results:
Extrema_and_MCMC_g2.mw
MCMC_20231211_084053.mw

To ease the comparison of the two sets of results I used the same random seeds inn both codes.
Comparing the results got around the first admissible point is straightforward.
It's more complex for @raj2018's solution because the first step of the algorithim (drawing of a sibgle chain of length 1000) finds six times more admissible point with g(c)= exp(-|3-M|) than with g(c) = (M + 1) / 4.                                 

For years I've been angry that Maple isn't capable of formally manipulating random vectors (aka multivariate random variables).
For the record Mathematica does.

The problem I'm concerned with is to create a vector W such that

type(W, RandomVariable)

will return true.
Of course defining W from its components w1, .., wN, where each w is a random variable is easy, even if these components are correlated or, more generally dependent ( the two concepts being equivalent iif all the w are gaussian random variables).
But one looses the property that W is no longer a (multivariate) random variable.
See a simple example here: NoRandomVectorsInMaple.mw

This is the reason why I've developped among years several pieces of code to build a few multivariate random variable (multinormal, Dirichlet, Logistic-Normal, Skew Multivariate Normal, ...).

In the framework of my activities, they are of great interest and the purpose of this post is to share what I have done on this subject by presenting the most classic example: the multivariate gaussian random variable.

My leading idea was (is) to build a package named MVStatistics on the image of the Statistics package but devoted to Multi Variate random variables.
I have already construct such a package aggregating about fifty different procedures. But this latter doesn't merit the appellation of "Maple package" because I'm not qualified to write something like this which would be at the same time perennial, robust, documented, open and conflict-free with the  Statistics package.
In case any of you are interested in pursuing this work (because I'm about to change jobs), I can provide it all the different procedures I built to construct and manipulate multivariate random variables.

To help you understand the principles I used, here is the most iconic example of a multivariate gaussian random variable.
The attached file contains the following procedures

MVNormal
  Constructs a gaussian random vector whose components can be mutually correlated
  The statistics defined in Distribution are: (this list could be extended to other
  statistics, provided they are "recognized" statitics, see at the end of this 
  post):
      PDF
      Mode
      Mean
      Variance
      StandardDeviation = add(s[k]*x[k], k=1..K)
      RandomSample

DispersionEllipse
  Builds and draws the dispersion ellipses of a bivariate gaussia, random vector

DispersionEllipsoid
  Builds and draws the dispersion ellipsoids of a trivariate gaussia, random vector

MVstat
  Computes several statistics of a random vector (Mean, Variance, ...)

Iserlis
  Computes the moments of any order of a gaussian random vector

MVCentralMoment
  Computes the central moments of a gaussian random vector

Conditional
  Builds the conditional random vector of a gaussian random vector wrt some of its components 
  the moments of any order of a gaussian random vector.
  Note: the result has type RandomVariable.

MarginalizeAgainst
  Builds the marginal random vector of a gaussian random vector wrt some of its components 
  the moments of any order of a gaussian random vector.
  Note: the result has type RandomVariable.

MardiaNormalityTest
  The multi-dimensional analogue of the Shapiro-Wilks normality test

HZNormalityTest
  Henze-Zirkler test for Multivariate Normality

MVWaldWolfowitzTest
  A multivariate version of the non-parametrix Wald-Folfowitz test

Do not hesitate to ask me any questions that might come to mind.
In particular, as Maple introduces limitations on the type of some attributes (for instance Mean  must be of algebraic type), I've been forced to lure it by transforming vector or matrix quantities into algebraic ones.
An example is

Mean = add(m[k]*x[k], k=1..K)

where m[k] is the expectation of the kth component of this random vector.
This implies using the procedure MVstat to "decode", for instance, what Mean returns and write it as a vector.

MultivariateNormal.mw

About the  statistics ths Statistics:-Distribution constructor recognizes:
To get them one can do this (the Normal distribution seems to be the continuous one with the most exhaustive list os statistics):

restart
with(Statistics):
X := RandomVariable(Normal(a, b)):
attributes(X);
      protected, RandomVariable, _ProbabilityDistribution

map(e -> printf("%a\n", e), [exports(attributes(X)[3])]):
Conditions
ParentName
Parameters
CharacteristicFunction
CDF
CGF
HodgesLehmann
Mean
Median
MGF
Mode
PDF
RousseeuwCrouxSn
StandardDeviation
Support
Variance
CDFNumeric
QuantileNumeric
RandomSample
RandomSampleSetup
RandomVariate
MaximumLikelihoodEstimate

Unfortunately it happens that for some unknown reason a few statistics cannot be set by the user.
This is for instance the case of Parameters serious consequences in certain situations.
Among the other statistics that cannot be set by the user one finds:

  • ParentName,
  • QuantileNumeric  whose role is not very clear, at least for me, but which I suspect is a procedure which "inverts" the CDF to give a numerical estimation of a quantile given its probability.
    If it is so accessing  QuantileNumeric would be of great interest for distributions whose the quantiles have no closed form expressions.
  • CDFNumeric  (same remark as above)

Finally, the statistics Conditions, which enables defining the conditions the elements of Parameters must verify are not at all suited for multivariate random variables.
It is for instance impossible to declare that the variance matrix (or the correlation matrix) is a square symmetric positive definite matrix).

What do you think is the acceptable limit to the effort required to answer a question?

At what point does the question-and-answer game between two contributors become unreasonable?

How do you, the most highly ranked, deal with situations that last for days?

1 2 3 4 5 6 Page 1 of 6