4500 Reputation

6 years, 289 days

An exemple of model selection on a test ...

@vs140580

As I told you before I don't know how familiar you are with these questions.
Whatever here is an illustration for a limited (10) number of regressors.

(larger numbers require specific methods,  such as "stepwise regression", to explore efficiently a family of potential models)

Analyse this worksheet, try to understand what I did and if you face any difficulty do not hesitate to contact me.

The best model I obtain results from a balance between the quality of prediction (assessed on the test base) and parsimony (number of terms,or parameters, it contains).
As a thumb of rule the  quality of prediction varies inversely of the parsimony (quite intuitive isn't it?)

TrainTest_and_ModelSelection_2.mw

 > restart:
 > with(Statistics):

Skip this if you read XY from some data file

 > # Simulation of the data you could read with # XY := ExcelTools(...) # # Part one : regressors (aka "inputs") N := 200: P := 10: X := LinearAlgebra:-RandomMatrix(N, P, generator=-1.0 .. 1.0):
 > # Part two : response (aka "output", assumed to be scalar) # # step 1 choose a model X --> Y (I will "forget" it later) regressors := [seq(V__||p, p=1..P)]: TrueModel  := randpoly(regressors, degree=1, homogeneous, coeffs=proc() combinat:-randperm([0, 1, 10])[1] end proc)               +               randpoly(regressors, degree=2, homogeneous, coeffs=proc() combinat:-randperm([0\$3, 0.1\$2])[1] end proc)               ; TrueModel_coeff := coeffs(TrueModel); TrueModel_base  := [op(TrueModel)] /~ TrueModel_coeff ; TrueModel := unapply(TrueModel, regressors);
 (1)
 > # step 2 build Y = TrueModel(X) Y := Vector(N, i -> TrueModel(entries(X[i], nolist))):
 > # Part three: XY = < X | Y>.   XY :=
 (2)

Real stuff begins here

 > Train_fraction := 0.4: Train_N        := round(N*Train_fraction); mixing         := combinat:-randperm([\$1..N]): Train_num      := mixing[1..Train_N]: Test_num       := mixing[Train_N+1..N]: Train_XY       := XY[Train_num]; Test_XY        := XY[Test_num];
 (3)
 > # Define a family of models the trus, but now assumed unknown true model, is # supposed to belong to. # # For the sake of simplicity one can assume this family is made of order 1 polynomials # in 1 or 2... or P=10 regressors. # This family has 2^P = 1024 elements. # # Scan all models is still possible, but if P is larger (you talked about values around 50) # or if the family is more complex (polunomials of total degree = 2 for instance), the # combinatorics besomes huge and very specific meethods Maple doesn't posseses are required # to make the model selection efficient. # # Family_code codes for its members: the nth member is a first order polynomial whose # indeterminates are such that Family_codes[n, i]=1 for i=1..P. Family_code := convert(                  map(                    n -> parse~(                           StringTools:-Explode(sprintf(cat("%.", P, "d"), convert(n, binary)))                         )                         , [\$1..2^P-1]                  )                  , Matrix                ):
 > # Constructor of bases of regressors Member_base := n  -> [op(Family_code[n] . )]: # Two examples Family_code[1]  , Member_base(1); Family_code[123], Member_base(123);
 (4)
 > # column extractor Extractor := n -> [ ListTools:-SearchAll(1, convert(Family_code[n], list)) ]: # Two examples Family_code[1]  , Extractor(1); Family_code[123], Extractor(123);
 (5)
 > # One example of regression # # step 1 : build the vector of coefficients Member_res := LinearFit(                 Member_base(123)                 , Train_XY[.., Extractor(123)]                 , Train_XY[.., P+1]                 , Member_base(123)                 , output=[parametervector, residualstandarddeviation]               ): Member_coeff       := Member_res[1]; Member_Train_error := Member_res[2]; # step 2 : Use Member_coeff to predict over the test base Member_pred := Test_XY[.., Extractor(123)] . Member_coeff; # step 3 : compare the test base output to the test predictions ScatterPlot(Test_XY[.., P+1], Member_pred)

Find the model within the family which minimizes... what ???

Obviously the model which will minimize the resirual sum of squares (RSS) between the test base
responses and the predictions is the full model with all the regressors accounted for.

This model has "complexity" P.
Generally (it would take too much time to explain you why if you are not familiar with the topic),
one consider has the "best model" the one which realizes some balance between RSS and the complexity.

I choose here the Akaike criterion (AIC):

AIC = 2*NI - 2*log(L)

where :
NI = number of indeterminates of the model,
L = maximum likelihood

 > Family_res := Matrix(2^P-1, 3): for n from 1 to 2^P-1 do   Member_coeff := LinearFit(                     Member_base(n)                     , Train_XY[.., Extractor(n)]                     , Train_XY[.., P+1]                     , Member_base(n)                     , output=parametervector                   ):   Member_RSS := Variance(Test_XY[.., P+1] - Test_XY[.., Extractor(n)] . Member_coeff);   Family_res[n, 1] := numelems(Member_base(n));   Family_res[n, 2] := Member_RSS;   Family_res[n, 3] := Variance(Train_XY[.., P+1] - Train_XY[.., Extractor(n)] . Member_coeff);   if n mod 10 = 0 then printf("%d  %a\n", n, Member_RSS) end if; end do:
 > # Assuming a gaussian likelihood L, the term ln(L) simplifies to Family_res[.., 2] Family_AIC := 0*Family_res[.., 1] + Family_res[.., 2]/~Family_res[.., 3]: ScatterPlot(   Family_res[.., 1]   , Family_res[.., 2]/~Family_res[.., 3]   , symbol=box   , symbolsize=10   , labels=["Complexity", "RSS"] ); print():print(): ScatterPlot(   Vector(2^P-1, i -> i)   , Family_AIC   , symbol=solidcircle   , symbolsize=5   , size=[1200, 300]   , labels=["Model num", "AIC"] )
 > Family_5_best_num   := sort(Family_AIC, output=permutation)[1..5]; Family_5_best_model := Member_base~(Family_5_best_num)
 (6)
 > # Observe Family_best_model contains the 4 most important terms of TrueModel TrueModel(op(regressors))
 (7)
 > # The "best model" q   := Family_5_best_num[1]; MOD := Family_5_best_models[1]; Member_coeff := LinearFit(                   Member_base(q)                   , Train_XY[.., Extractor(q)]                   , Train_XY[.., P+1]                   , Member_base(q)                   , output=parametervector                 ): `Best Model` = add(MOD[k]*%[k], k=1..numelems(MOD)); Member_pred := Test_XY[.., Extractor(q)] . Member_coeff: `Residual variance` = Variance(Test_XY[.., P+1] - Test_XY[.., Extractor(q)] . Member_coeff); `Residual std dev` = StandardDeviation(Test_XY[.., P+1] - Test_XY[.., Extractor(q)] . Member_coeff); `Akaike criterion` = Family_AIC[q]; plots:-display(   ScatterPlot(     Test_XY[.., P+1]     , Member_pred     , labels=["Y (test base)", "Y (best prediction)"]     , labeldirections=[default, vertical]   )   , plot([[min(Test_XY[.., P+1])\$2], [max(Test_XY[.., P+1])\$2]], color=red) )
 > # Let's compare the previous results to the performances of the # most complex model (number 2^P-1). # # As said previously this model has the lowest RSS, but given its complexity it # appears not to be better then the one whcich minimizes the Akaike criterion. q := 2^P-1: Member_coeff := LinearFit(                   Member_base(q)                   , Train_XY[.., Extractor(q)]                   , Train_XY[.., P+1]                   , Member_base(q)                   , output=parametervector                 ): `Best Model` = add(Member_base(q)[k]*%[k], k=1..numelems(Member_base(q))); Member_pred := Test_XY[.., Extractor(q)] . Member_coeff: `Residual variance` = Variance(Test_XY[.., P+1] - Test_XY[.., Extractor(q)] . Member_coeff); `Residual std dev` = StandardDeviation(Test_XY[.., P+1] - Test_XY[.., Extractor(q)] . Member_coeff); `Akaike criterion` = Family_AIC[q]; plots:-display(   ScatterPlot(     Test_XY[.., P+1]     , Member_pred     , labels=["Y (test base)", "Y (best prediction)"]     , labeldirections=[default, vertical]   )   , plot([[min(Test_XY[.., P+1])\$2], [max(Test_XY[.., P+1])\$2]], color=red) )
 >

@vs140580 So this is not Train and ...

So this is not Train and Test base that you need but something else generically named "model competition".

If I correctly understant yu have a set X of scalar egressors, a scalar response Y and a family F={F[i] : X -> Y, i in some countable or not set} of models.
You want to find the best (in what sense?) model F* in F, such that  || Ytest - F*(Xtest) || is minimum for a suitable norm ||.||.

If I'm mistaken explain yourself more clearly.
If I'm right provide a set od data with explanaroty material.

The problem as a whole is that vast that there are many strategies that I could discuss for hours, so be specific.

An illustration is provided in my answer

Details...

As I don't have Excel data, the first part of the worksheet is aimed to emulate them.

Great Job.
I had thought about it for a while but finally gave up

@pauldaasIn case you would be interested...

@pauldaas

In case you would be interested in the plot alone, this procedure accounts for "undefined" points , that is values of beta for wich eq(beta) is of the form 0=0:

 > restart:
 > # This procedure doesn(t handle the case of complex roots SOLplot := proc(f, R, {n::positive:=100})   local r, e, s, pts, disc, plo:   pts  := NULL:   disc := NULL:   plo  := NULL:   for r from op(1, R) to op(2, R) by (op(2, R)-op(1, R))/(n-1) do     e := eval(f, beta=r):     if has(e, xi) then       s   := select(is, [solve(e, xi)], RealRange(Open(0), Open(1))):       pts := pts, seq([r, i], i in s):     else       plo  := plo, plot([pts], style=line, view=[R, default]):       pts  := NULL:       disc := disc, r:     end if:   end do:   plo  := plo, plot([pts], style=line, view=[R, default]):   return [pts], [disc], plots:-display(plo) end proc:
 > eq := [   (1/6)*beta^3+(1/2)*xi^2*beta-(1/2)*beta^2-(1/2)*xi^2+(1/3)*beta = 0,   (1/6)*beta*(beta^2+3*xi^2-6*xi+2) = 0 ];
 (1)
 > pts, disc, p := SOLplot(eq[1], 0..2, n=101):
 > # "discontinuity" points disc
 (2)
 > # "discontinuity" points plots:-display(p, gridlines=true)
 > pts, disc, p := SOLplot(eq[2], -1..1, n=101):
 > # "discontinuity" points disc
 (3)
 > # "discontinuity" points plots:-display(p, gridlines=true)
 >

@sursumCorda  Thanks for this info...

@sursumCorda

Thanks for this information.
My version of Maple is too old to run correctly this code. I'll look at it when back at the office.

I have just completed my previous commen...

Here is a procedure aimed to find the tangent points between the graph of a curve of expression f(x,y)=0 and circles of equation
x^2+y^2-R^2=0.
This procedure returns a list of three elements:

1. the triples (x, y, R)
2. the minimum value of R (tangency points closest to the origin)
3. the coordinates of the tangency points closest to the origin

The procedure needs to be improved for it gives wrong result with function
f(x, y)  = x^3 + y^2 - x + 2/(3*sqrt(3))
(an isolated point [x=1/sqrt(3), y=0] is detected which is obviously not on the graph of f(x, y))

 > restart;
 > interface(version)
 (1)
 > with(plots):
 > ClosestPoint := proc(F)   local f, g, Df, Dg, sys, sol, xR, Y, xyR, i, j:   local Rmin, MPP:   if F::`=` then     f := (lhs-rhs)(F)   else     f := F:   end if:   g := x^2+y^2-R^2;   Df := diff~(f, [x, y]);   Df := Df /~ sqrt(Df[1]^2+Df[2]^2);   g  := x^2+y^2-R^2;   Dg := diff~(g, [x, y]);   Dg := Dg /~ sqrt(Dg[1]^2+Dg[2]^2);   sys := {(Df =~ Dg)[], f=g};   sol := solve(sys, [x, y, R])[];   # contact points with vertical tangent   {solve({Df[2]=0}, [x, y])[]};   allvalues(%);   map(s -> solve(eval(f, s), x), %);   remove(has, simplify~(%), I);   sol := sol, map(s -> [x=s, R=abs(s), y=0], %)[];   sol := map(allvalues, [sol]):   xR  := map(s -> if (`not`@has)(eval(R, s), x) then remove(has, s, y) end if, sol);   Y   := map(s -> [solve(eval(g, s), y)], xR);   xyR := NULL:   for i from 1 to numelems(xR) do     for j from 1 to numelems(Y[i]) do       xyR := xyR, [ xR[i][], y=Y[i][j] ]:     end do:   end do:   xyR := remove(has, {xyR}, I);   xyR := map(s -> if evalf(eval(R, s)) > 0 then s end if, xyR);   Rmin := min(eval~(R, xyR));   MPP  := select(has, xyR, (R=Rmin));   return [ xyR, Rmin, MPP ] end proc:
 > C := ClosestPoint(x^3+y^2-x=1/3)
 (2)
 > display(   implicitplot(x^3+y^2-x = 1/3, x=-2..2, y=-2..2, grid=[100, 100], color=red)   , seq(plottools:-circle([0, 0], eval(R, c), color=blue), c in C[1])   , seq(plot([eval([x, y], c)], style=point, symbol=box, symbolsize=20, color=magenta), c in C[1])   , seq(plot([eval([x, y], c)], style=point, symbol=diagonalcross, symbolsize=20, color=magenta), c in C[1])   , seq(plot([eval([x, y], c)], style=point, symbol=box, symbolsize=20, color=green), c in C[3])   , seq(plot([eval([x, y], c)], style=point, symbol=diagonalcross, symbolsize=20, color=green), c in C[3])   , scaling=constrained   , view=[-1.1..1.5, -1..1]   , title="Green points are those closest to the origin" );
 > C := ClosestPoint(x^3+y^2-x = -2/(3*sqrt(3)))
 (3)
 > display(   implicitplot(x^3+y^2-x = -2/(3*sqrt(3)), x=-2..2, y=-2..2, grid=[100, 100])   , seq(plottools:-circle([0, 0], eval(R, c), color=blue), c in C[1])   , seq(plot([eval([x, y], c)], style=point, symbol=box, symbolsize=20, color=magenta), c in C[1])   , seq(plot([eval([x, y], c)], style=point, symbol=diagonalcross, symbolsize=20, color=magenta), c in C[1])   , seq(plot([eval([x, y], c)], style=point, symbol=box, symbolsize=20, color=green), c in C[3])   , seq(plot([eval([x, y], c)], style=point, symbol=diagonalcross, symbolsize=20, color=green), c in C[3])   , scaling=constrained   , view=[-2..2, -2..2]   , title="Green points are those closest to the origin" );
 > # Why is the result wrong ?   sol := NULL:   f  := x^3+y^2-x+2/(3*sqrt(3));   Df := diff~(f, [x, y]);   # contact points with vertical tangent   {solve({Df[2]=0}, [x, y])[]};   allvalues(%);   map(s -> solve(eval(f, s), x), %);   remove(has, simplify~(%), I);   sol := map(s -> [x=s, R=abs(s), y=0], %)[];
 (4)
 > # Point [x = (1/3)*sqrt(3), R = (1/3)*sqrt(3), y = 0] is the wrong
 > eval(f, [x = (1/3)*sqrt(3), R = (1/3)*sqrt(3), y = 0])
 (5)
 >

As the second function f(x, y) has a null genus, it admits a parameterized representation [x(t), y(t)].
It appears that this approach here returns the correct answer:

```f  := x^3+y^2-x + 2/(3*sqrt(3));
3    2       2  (1/2)
x  + y  - x + - 3
9
Gf := algcurves:-genus(f, x, y);
0
if Gf = 0 then
A   := algcurves:-parametrization(f,x,y,t);
C   := R *~ [(1-s^2)/(1+s^2), 2*t/(1+s^2)];
sys := [(diff(A, t) =~ diff(C, t))[], (A =~ C)[]]:
sol := solve(sys, [s, t, R]);
MPP := [x, y] =~ eval(A, sol[]);
end if:

MPP;
[       1  /   (1/2)    \ /          (1/2)\       ]
[x = - --- \2 3      + 9/ \-12 + 18 3     /, y = 0]
[      207                                        ]

expand(simplify(%));
[x = -(2/3)*sqrt(3), y = 0]
evalf(%);
[x = -1.154700539, y = 0.]

```

This observation could be used to improve the robusteness of procedure ClosestPoint.

No revolution here...

just a simpler version of your code (I was completely unaware of Draghilev's method)
Draghilev_mmcdara.mw

Couldn't it be possible that Mapleprimes offers a kind of "log file" to trace such movements?
We never know with certainty if the disappearance of a question comes from a doubloon with another one, a conversion in post, or a suppression by the author himself

@sursumCorda Hi, what happened to y...

Did you delete or move it elsewhere, or did someone else took the initiative?

It always f.....g sucks to spend time preparing an answer and then realize you would have been better off taking a leak.
(My apologies to those who felt shocked by these words)

It works functionally well with Maple 20...

It works functionally well with Maple 2015.2... but  the result is wrong whatever the version

For your information, your problem is a classic of structural reliability analysis.
The "standards" algorithms used to solve this kind of problems are:

• The Abdo-Rackwitz (AR) algorithm.
T. ABDO, R. RACKWITZ, A new beta-point algorithm for large time-invariant and time-variant reliability problems, [in:] A. DER KIUREGHIAN and P. THOFT-CHRISTENSEN [Eds.], Reliability and Optimization of Structural Systems '90 Proc. 3rd WG 7.5 IFIP Conf. Berkeley 26–28 March 1990, 1–12, Berlin 1991.
(Didn(t find an open source)
• The Hassofer-Lind-Rackwitz-Fiessler (HLRF) algorithm (a variant of SQP)
https://www.researchgate.net/publication/283997508_New_optimization_algorithms_for_structural_reliability

@dharr Taking up your original idea...

Taking up your original idea (I voted up), here is a little bit faster way to get the solution

```with(inttrans):

Lsys := eval(laplace(sys, t, s), ics):
F    := convert(remove(has, indets(sys, function), diff), list):
F2L  := map(u -> op(0, u)(s), F):
L    := laplace~(F, t, s):
Asys := eval(Lsys, L =~ F2L):
Asol := solve(Asys, F2L)[]:
Fsol := F =~ invlaplace~(rhs~(Asol), s, t):

```

@dharr Hi, A passing remark: h...

Hi,
A passing remark: have you noticed how combinat:-cartprod is slow compared to your simple double loop ?

@sursumCorda I don't think your...

I don't think your comparison is very reliable.
To enhance the credibility of CodeTools:-Usage outputs it is advisable touse the option iterations.
For a single iteration the cpu times vary from 8ms to 12ms (note this is muchsmaller than the 47ms you got).
For 100 iterations the same times vary from 9.88ms to 10.44ms.
One can say the mean cpu time is a little bit below 10ms.

It also appears that i||j is faster than cat(i, j) (third test).

It's also interesting to notice that the time used to construct the cartesian product of the two lists is even substantially smaller (fourth test)

 > restart;
 > interface(version); interface(displayprecision=4):
 (1)
 > T:= NULL: for i from 1 to 1000 do T := T, CodeTools:-Usage([seq](seq(cat(i,j),j in [\$("a".."z"),\$("A".."Z"),\$("A".."Z"),\$("a".."z"),\$("A".."Z")]),i in [\$("A".."Z"),\$("a".."z"),\$("a".."z"),\$("A".."Z"),\$("a".."z")]), iterations=1, output=cputime, quiet): end do: Statistics:-Histogram([T], discrete=true, thickness=10)
 > T:= NULL: for i from 1 to 1000 do T := T, CodeTools:-Usage([seq](seq(cat(i,j),j in [\$("a".."z"),\$("A".."Z"),\$("A".."Z"),\$("a".."z"),\$("A".."Z")]),i in [\$("A".."Z"),\$("a".."z"),\$("a".."z"),\$("A".."Z"),\$("a".."z")]), iterations=10, output=cputime, quiet): end do: Statistics:-Histogram([T], discrete=true, thickness=8)
 > T:= NULL: for i from 1 to 1000 do T := T, CodeTools:-Usage([seq](seq(i||j,j in [\$("a".."z"),\$("A".."Z"),\$("A".."Z"),\$("a".."z"),\$("A".."Z")]),i in [\$("A".."Z"),\$("a".."z"),\$("a".."z"),\$("A".."Z"),\$("a".."z")]), iterations=10, output=cputime, quiet): end do: Statistics:-Histogram([T], discrete=true, thickness=8)
 > L1 := [\$("a".."z"),\$("A".."Z"),\$("A".."Z"),\$("a".."z"),\$("A".."Z")]: L2 := [\$("A".."Z"),\$("a".."z"),\$("a".."z"),\$("A".."Z"),\$("a".."z")]: T:= NULL: for i from 1 to 1000 do T := T, CodeTools:-Usage([seq](seq(i||j, j in L1),i in L2), iterations=10, output=cputime, quiet): end do: Statistics:-Histogram([T], discrete=true, thickness=8)
 > N1 := numelems(L1): N2 := numelems(L2): A1 := Matrix(N1, N2, (i, j) -> L1[i]): A2 := Matrix(N1, N2, (i, j) -> L2[j]): T:= NULL: for i from 1 to 1000 do T := T, CodeTools:-Usage(cat~(A1, A2), iterations=10, output=cputime, quiet): end do: Statistics:-Histogram([T], discrete=true, thickness=8)
 >