8 years, 97 days

## Representing a hierarchical table as a t...

Maple

In the applications I am working on, the information are often represented by hierarchical tables (that is tables where some entries can also be tables, and so on).
To help people to understand how this information is organized, I have thought to representent this hierarchical table as a tree graph.
Once this graph built, it becomes very simple to find where a "terminal leaf", that is en entry which is no longer a table, is located in the original table (by location I mean the sequence of indices for which the entry is this "terminal leaf".

The code provided here is pretension free and I do not doubt a single second  that people here will be able to improve it.
I published it for i thought other people could face the same kind of problems that I do.

 > restart
 > with(GraphTheory): interface(version);
 (1)
 > gh := proc(T)   global s, counter, types:   local  i:   if type(T, table) then     for i in [indices(T, nolist)] do       if type(T[i], table) then          s := s, op(map(u -> [i, u], [indices(T[i], nolist)] ));       else          counter := counter+1:          types   := types, _Z_||counter = whattype(T[i]);          s       := s, [i, _Z_||counter];       end if:       thisproc(T[i]):     end do:   else     return s   end if: end proc:
 > t := table([a1=[alpha=1, beta=2], a2=table([a21=2, a22=table([a221=x, a222=table([a2221={1, 2, 3}, a2222=Matrix(2, 2), a2223=u3, a2224=u4])])]), a3=table([a31=u, a32=v])]); global s, counter, types: s       := NULL: counter := 0: types   := NULL: ghres := gh(t): types := [types]:
 (2)

These 3 lines determine the set of edges of the form ['t', v], that are not been captured by procedure h.
They correspond to "first level" indices of table t (v in {a1, a2, a3} in the example above)

 > L := convert(op~(1, [ghres]), set):      R := convert(op~(2, [ghres]), set): FirstLevelEdges := map(u -> ['t', u], L union R minus R):

Complete the set of the edges, build the graph representation TG of table t and draw TG.

 > edges := convert~({ghres, FirstLevelEdges[]}, set): TG := Graph(edges): HighlightVertex(TG, Vertices(TG), white): p := DrawGraph(TG, style=tree, root='t'):

The first line is used to change the the "terminal leaves" of names  _Z_n by their type.

 > eval(t); p       := subs(types, p): enlarge := plottools:-transform((x,y) -> [3*x, y]): plots:-display(enlarge(p), size=[1000, 400]);

This procedure is used to find the "indices path" to a terminal leaf.
FindLeaf is then applied to all the terminal leaves.

 > FindLeaf := proc(TG, leaf)    local here:    here := GraphTheory:-ShortestPath(TG, 't', leaf)[1..-2]:    here := cat(convert(here[1], string), convert(here[2..-1], string)):    here := StringTools:-SubstituteAll(here, ",", "]["):    here := parse(here); end proc: # where is a2221 printf("%a\n", FindLeaf(TG, a2221));
 t[a2][a22][a222]
 >

## An improved approximation of the Inverse...

Maple

Seeking for fast approximate formulas to compute (a huge number of) quantiles of a Gaussian random variable (here the standard one, but its extension to any Gaussian RV is straightforward), I found a few of them in the Abramowitz and Stegun book, page 933, relations 26.2.22 and 26.2.23.
Each approximation model is expressed as a rational fraction, the second one being the more accurate.
Each model depends on (respectively 4 and 6) parameters that are estimated (I guess it was done this way) through a least-square-like method.

See here for an online access http://people.math.sfu.ca/~cbm/aands/page_933.htm.

These approximation, and specially the most accurate one (formula 26.2.23) seem to be still widely used today(1) (see for instance ).

As an amusement I decided to compute the best fit by using the Statistics:-NonLinearFit procedure and a sample of (probability, quantile) points where probability ranges in [0.5, 1-1/1000] (the range used in formulas 26.2.22 and 26.2.23 is (0, 0.5] but this is not a point).
Surprisingly Statistics:-NonLinearFit returned, for the two formulas, parameter estimations substantially different from the one given in the Abramowitz & Stegun's book. A reason could be that the points I used when I did the fits weren't the one they used (unfortunately they give no informations about this).

More interesting, whatever the formula I refitted,  NonLinearFit produced an approximation whose the absolute error was smaller by about two orders of magnitude to the onesprovided by Abramowitz and Stegun.
For instance they wrote that the most accurate formula (26.2.23) had an absolute approximation error less than 4.5*10-4 as I obtained a value around 10-6!

(1) To get an idea of the persistence of the use of the formula 26.2.23, just type the value 2.515517 of its parameter c[0] in any search engine.

In the plots below the gray rectangle refers to the region where the approximate ICDF is used for extrapolation.

 > restart:
 > with(Statistics):
 > cdf := unapply(evalf(CDF(Normal(0, 1), x)), x): X   := [seq(0..5, 0.1)]: A   := cdf~(X): T  := alpha -> sqrt(-2*log(1-alpha)): q  := Quantile~(Normal(0, 1), A): Aq := convert([A,q], Matrix)^+:
 > r := 1: J := z -> z - add(a__||k*z^k, k=0..r)/(1+add(b__||k*z^k, k=1..r+1)): model  := J(T(alpha)): NL_fit := unapply(NonlinearFit(model, Aq, alpha), alpha); # these lines are for estimating the performances B  := Sample(Uniform(0.5, 1), 10^4): CodeTools:-Usage(Quantile~(Normal(0, 1), B)): CodeTools:-Usage(Quantile~(Normal(0, 1), B, numeric)): CodeTools:-Usage(NL_fit~(B)): #----------------------------------------------------- Y  := [seq(0..6, 0.01)]: B  := cdf~(Y): R1 := Quantile~(Normal(0, 1), B, numeric): R2 := NL_fit~(B): plots:-display(   ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),   plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6) );
 memory used=170.31MiB, alloc change=76.01MiB, cpu time=3.06s, real time=3.05s, gc time=54.87ms memory used=171.59MiB, alloc change=256.00MiB, cpu time=3.12s, real time=3.03s, gc time=154.77ms memory used=8.24MiB, alloc change=0 bytes, cpu time=95.00ms, real time=95.00ms, gc time=0ns
 > r := 2:   J := z -> z - add(a__||k*z^k, k=0..r)/(1+add(b__||k*z^k, k=1..r+1)): model  := J(T(alpha)): NL_fit := unapply(NonlinearFit(model, Aq, alpha), alpha); # these lines are for estimating the performances B  := Sample(Uniform(0.5, 1), 10^4): CodeTools:-Usage(Quantile~(Normal(0, 1), B)): CodeTools:-Usage(Quantile~(Normal(0, 1), B, numeric)): CodeTools:-Usage(NL_fit~(B)): #----------------------------------------------------- Y  := [seq(0..6, 0.01)]: B  := cdf~(Y): R1 := Quantile~(Normal(0, 1), B, numeric): R2 := NL_fit~(B): plots:-display(   ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),   plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6) );
 memory used=170.09MiB, alloc change=32.00MiB, cpu time=3.29s, real time=3.11s, gc time=268.60ms memory used=170.85MiB, alloc change=0 bytes, cpu time=3.23s, real time=3.10s, gc time=201.52ms memory used=10.76MiB, alloc change=0 bytes, cpu time=127.00ms, real time=127.00ms, gc time=0ns
 > # Optimized "r=2" computation z_fit := simplify(subs(alpha=-exp(-(1/2)*z^2)+1, NL_fit(alpha))) assuming z > 0: z_fit := unapply(convert~(%, horner), z); p := proc(alpha)   local z:   z := sqrt(-2*log(1-alpha)):   z_fit(z): end proc: R3 := CodeTools:-Usage(p~(B)): plots:-display(   ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red, gridlines=true, size=[700, 400]),   plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6) );
 memory used=1.67MiB, alloc change=0 bytes, cpu time=14.00ms, real time=15.00ms, gc time=0ns

AS stands for Abramowith & Stegun

 > J_AS := unapply(normal(eval(J(t), [a__0=2.515517, a__1=0.802853, a__2=0.010328, b__1=1.432788, b__2=0.189269, b__3=0.001308])), t): J_AS(t); # for comparison: print(): z_fit := simplify(subs(alpha=-exp(-(1/2)*z^2)+1, NL_fit(alpha))) assuming z > 0: map(sort, %, z); plot([z_fit(z), J_AS(z)], z=0.5..1, color=[blue, red], legend=[mmcdara, Abramowitz_Stegun], gridlines=true); print(): R2_AS := CodeTools:-Usage(J_AS~(T~(B))): print(): plots:-display(   ScatterPlot(R1, log[10]~(abs~(R2_AS-~R1)), legend=Abramowitz_Stegun, gridlines=true, size=[700, 400]),   ScatterPlot(R1, log[10]~(abs~(R2-~R1)), legend=mmcdara, color=red),   plottools:-rectangle([max(X), log[10]~(min(abs~(R2-~R1)))], [max(Y), log[10]~(max(abs~(R2-~R1)))], color=gray, transparency=0.6) );
 memory used=2.92MiB, alloc change=0 bytes, cpu time=25.00ms, real time=25.00ms, gc time=0ns
 >