acer

32490 Reputation

29 Badges

20 years, 8 days
Ontario, Canada

Social Networks and Content at Maplesoft.com

MaplePrimes Activity


These are replies submitted by acer

Do you mean that you want to plot V(t) in terms of both U0 and t, as a 3D plot? If not, and you somehow want a 2D plot of V(t) in terms of U0, then what would be the value for t?

What is the value of A, which appears in your equations?

Are you 100% sure that the initial conditions are correct? Would you accept, say, D(r1)(0)=10^(-6) or greater?

@Kitonum aha. I was paying attention to the posted C code, and you at the title. It's interesting that his C code doesn't do what the title asks, for all negative real r.

@Kitonum Is that going to act the same as the C cast, for negative r?

As a side note, you should stop constructing expressions that involve both a symbol such as RK as well as the indexed name RK[1].  (Or K alongside K[1], say) It will lead to problems, down the road.

The OP uses a table Rr with a custom indexing function `index/xxx`. The only purpose of this seems to be to allow the typeset 2D math to have (subscripted) indexed name reference rather than function calls to RrProc.

That un-memoized use of the custom-indexing function inhibits performance considerably, which is magnified by having the references be done inside a six-fold nested add.

Let's assume that the procedure RrProc is to be kept. (It is inefficient. In the Comment/Reply above, I replaced it with a simple formula, for even better performance. But let's imagine that it really is needed, and that there is no possible improvement to the add-indexing i=m+1..II since RrProc just returns zero when i<=m.)

I use the randomize command to produce the same random data for each worksheet. Sorting the polynomial result shows that the results are all effectively the same. It's just performance which differs. I also bumped up the parameters II=8, JJ=8, and M=6 to make the example take longer.

I passed the PI-bar to UP1 instead of PI. I don't know what the OP intends. Otherwise the PI-bar is unused in his worksheet. The results differ from using PI, but the performance concerns are basically the same.

The best performance here entails:
- Use a non-custom-indexing table when calling UP1, which is created up front (outside all the add calls) from the custom-indexed table. This is important.
- Pass that plain table to the procedure UP1 which does the main job.
- Use Grid:-Set to tell the kernel to pass the plain table along, regardless.
This is the first attachment below.

Almost as good is too do as above, but without passing the plain table to the procedure UP1.

Also not bad, but not as good as the last, is to put option remember on the procedure RrProc which is called by the table custom-indexing-function.

Worst of all is to use the custom-indexing function table, without option rememeber on RrProc. That is like the OP's original.

The best improvement is from the original seq use and 21sec to the best variant with Grid:-Seq of 1.3sec.

soal-1_no_customindex_pass.mw

soal-1_no_customindex_no_pass.mw

soal-1_customindex_pass_remember.mw

soal-1_customindex_no_pass_remember.mw

soal-1_customindex_pass.mw

soal-1_customindex_no_pass.mw

Further editing two of the add ranges to be i=m+1..II and k=m+1..II brings the best performance here down to 0.4sec. That's a 50-times speedup over the original, even while retaining the inefficient RrProc.

@JSalisbury Since solve doesn't produce a useful result when solving s for t, you can instead solve s for thetabn and reflect the plot.

Or you can produce an implicit plot. (This is less efficient, and can sometimes require additional options to get a smooth result. It's usually preferable to produce an explicit plot if you can.)

restart;

v := 145000:
thetavn := (1/6)*Pi:
omegac := 1/10:

s := v*cos(thetavn)*t*(cos(2*thetabn)*tan(thetabn)
     +sin(2*thetabn)*sin(omegac*t)/(omegac*t));

72500*3^(1/2)*t*(cos(2*thetabn)*tan(thetabn)+10*sin(2*thetabn)*sin((1/10)*t)/t)

form := simplify([solve(s = 0, thetabn)]) assuming t>0:
form := select(u->is(u>=0 and u<=Pi), form) assuming t>0;

[0, Pi, arctan((20*sin((1/10)*t)+t)^(1/2)/t^(1/2)), -arctan((20*sin((1/10)*t)+t)^(1/2)/t^(1/2))+Pi]

P2D:=plottools:-reflect(plot(form, t=0..200, color="Burgundy",
                           thickness=3), [[0,0],[1,1]]):
plots:-display(P2D, view=[0..Pi, 0..200],
               labels=[thetabn,t], gridlines=false);

# The curve along thetabn=Pi/2 is spurious.
plots:-implicitplot(s, thetabn = 0..Pi, t=0..200,
                    gridrefine=1, thickness=3,
                    gridlines=false);

P3D:=plot3d(proc(TH,T) local S;
          S:=eval(s,[thetabn=TH,t=T]);
          if S>1e7 or S<-1e7 then Float(undefined);
          else S; end if; end proc, 0..Pi, 0..200,
       grid=[201,201], view=-0.5e7 .. 0.5e7):

P2Dto3D:=plottools:-transform((x,y)->[x,y,eval(s,[thetabn=x,t=y])])(P2D):

plots:-display( P3D, P2Dto3D, labels=["thetabn","t","s"]);

 

Download implicit_plot_3.mw

@mehdibaghaee I don't think thay you should be very concerned about floating-point discrepencies between approaches if you can establish that it is small, say on the order of 10^(-Digits+1) or so, and decreases in magnitude as the working precision is increased. And that seems to be the case here. You can still use fnormal as a convenient way to gauge whether the differences are acceptable.

It is interesting that in at least Maple 2016.2 (your version) the kernel is able to pass along Rr -- as well as its indexing function -- to the distributed kernels automatically. It's also interesting that using Grid:-Set on `index/xxx` and RrProc decreases the benefit. Once those Grid:-Set calls are made both the version where Rr is passed as an extra argument, and not, see the decrease is benefit. The speedup obtained through use of Grid:-Seq is almost a factor of four over the use of serial seq. or about a factor of two if the Grid:-Set use is retained here. The kernel magic which figures out what assigned values in the parent kernel session need to be passed along to the child kernel sessions is subtle I believe. I've seen it change from release to release, mostly for the better. In the worksheet below I've commented out the version using Grid:-Set.

But I'd like to focus on what I've previously written about optimizing the serial code first, before parallizing.

In the following attachment I looked at your RrProc procedure. It returns 0 (zero) if its first parameter i is larger than its second parameter m. So quite a few calls to RrProc can be avoided simply by making the add loops for i and k go from m+1..II instead of 0..II . That is done in variant UP1a in the attachment, and improves the timing by about a factor of three.

Now procedure RrProc is not efficient. It creates and populates a Vector, flips it, etc, only to return only one of its entries. It could be made more efficient. But the values that the procedure returns are just 0 (zero) if i-m is even, and just 2*m+1 otherwise. So that whole procedure can be eliminated, and a simple formula used instead. Doing so brings about another factor of three speedup.

So, in summary, that's about a factor of 4 by using Grid:-Seq, and about another factor of 9 by eliminating the overhead of calling RrProc inside the loop.

nb. It's difficult to know whether this is a partially concocted computation for illustration, or part of your actual, broader goal. For example, you call procedure UP with two different values of s. But the first result for s=1 is simply 4/3 the result for s=2. So, as it stands, there no real need for calling the UP procedure for both values of s.

``

restart

 

#Digits:=15;

II := 6

6

(1)

JJ := 6

6

(2)

N := 2:

q := max(II+1, JJ+1):

M := 5:

seq(seq(seq(seq(assign(PI[i, j, r, m], a*`#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[i, j, r, m]), i = 0 .. q), j = 0 .. q), r = 1 .. N), m = 1 .. M):

a := .2:

RrProc := proc (i, m) local K, j, Q; if i <= m then 0 else K := 1; Q := Matrix(i, 1); for j by 2 to i do Q(j) := 2*i-K; K := 4+K end do; Q := FlipDimension(Q, 1); Q(m+1) end if end proc:
NULL

`#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))` := Array(0 .. II, 0 .. JJ, 1 .. 6, 1 .. M):

f1 := RandomArray(II+1, JJ+1):

for m to M do `&Pi;m`[1, m] := f1; `&Pi;m`[2, m] := f2; `&Pi;m`[3, m] := f3; `&Pi;m`[4, m] := f4; `&Pi;m`[5, m] := f5; `&Pi;m`[6, m] := f6 end do:

for m to M do `#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[0 .. II, 0 .. JJ, 1, m] := ArrayTools:-Alias(`&Pi;m`[1, m], [0 .. II, 0 .. JJ]); `#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[0 .. II, 0 .. JJ, 2, m] := ArrayTools:-Alias(`&Pi;m`[2, m], [0 .. II, 0 .. JJ]); `#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[0 .. II, 0 .. JJ, 3, m] := ArrayTools:-Alias(`&Pi;m`[3, m], [0 .. II, 0 .. JJ]); `#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[0 .. II, 0 .. JJ, 4, m] := ArrayTools:-Alias(`&Pi;m`[4, m], [0 .. II, 0 .. JJ]); `#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[0 .. II, 0 .. JJ, 5, m] := ArrayTools:-Alias(`&Pi;m`[5, m], [0 .. II, 0 .. JJ]); `#mover(mi("&Pi;",fontstyle = "normal"),mo("&uminus0;"))`[0 .. II, 0 .. JJ, 6, m] := ArrayTools:-Alias(`&Pi;m`[6, m], [0 .. II, 0 .. JJ]) end do:

UP1 := proc (s, PI, N, M, a, b, II, JJ) local k; i, j, r, p, m, q, n, l; Digits := 15; if s = 1 then add(add(add(add(add(add((2/3)*Rr[i, m]*Rr[k, m]*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*m+1)*(2*j+1)*a), i = 0 .. II), k = 0 .. II), m = 0 .. II), j = 0 .. JJ), p = 1 .. M), r = 1 .. M) elif s = 2 then add(add(add(add(add(add((1/2)*Rr[i, m]*Rr[k, m]*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*m+1)*(2*j+1)*a), i = 0 .. II), k = 0 .. II), m = 0 .. II), j = 0 .. JJ), p = 1 .. M), r = 1 .. M) end if end proc:

y1 := CodeTools:-Usage([Grid:-Seq(UP1(s, PI, N, M, a, b, II, JJ), s = 1 .. 2)]):

memory used=12.05MiB, alloc change=42.94MiB, cpu time=156.00ms, real time=1.35s, gc time=56.80ms

 

UP1a := proc (s, PI, N, M, a, b, II, JJ) local k; i, j, r, p, m, q, n, l; Digits := 15; if s = 1 then add(add(add(add(add(add((2/3)*Rr[i, m]*Rr[k, m]*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*m+1)*(2*j+1)*a), i = m+1 .. II), k = m+1 .. II), m = 0 .. II), j = 0 .. JJ), p = 1 .. M), r = 1 .. M) elif s = 2 then add(add(add(add(add(add((1/2)*Rr[i, m]*Rr[k, m]*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*m+1)*(2*j+1)*a), i = m+1 .. II), k = m+1 .. II), m = 0 .. II), j = 0 .. JJ), p = 1 .. M), r = 1 .. M) end if end proc:

y1a := CodeTools:-Usage([Grid:-Seq(UP1a(s, PI, N, M, a, b, II, JJ), s = 1 .. 2)]):

memory used=6.27MiB, alloc change=6.56MiB, cpu time=90.00ms, real time=357.00ms, gc time=36.02ms

 

UP1b := proc (s, PI, N, M, a, b, II, JJ) local k; i, j, r, p, m, q, n, l; Digits := 15; if s = 1 then add(add(add(`if`((i-m)::even or (k-m)::even, 0, add(add(add((2/3)*(2*m+1)*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*j+1)*a), j = 0 .. JJ), p = 1 .. M), r = 1 .. M)), i = m+1 .. II), k = m+1 .. II), m = 0 .. II) elif s = 2 then add(add(add(`if`((i-m)::even or (k-m)::even, 0, add(add(add((1/2)*(2*m+1)*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*j+1)*a), j = 0 .. JJ), p = 1 .. M), r = 1 .. M)), i = m+1 .. II), k = m+1 .. II), m = 0 .. II) end if end proc:

y1b := CodeTools:-Usage([Grid:-Seq(UP1b(s, PI, N, M, a, b, II, JJ), s = 1 .. 2)]):

memory used=1.58MiB, alloc change=2.19MiB, cpu time=35.00ms, real time=146.00ms, gc time=0ns

 

UP2 := proc (s, PI, N, M, a, b, II, JJ, Rr) local k; i, j, r, p, m, q, n, l; Digits := 15; if s = 1 then add(add(add(add(add(add((2/3)*Rr[i, m]*Rr[k, m]*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*m+1)*(2*j+1)*a), i = 0 .. II), k = 0 .. II), m = 0 .. II), j = 0 .. JJ), p = 1 .. M), r = 1 .. M) elif s = 2 then add(add(add(add(add(add((1/2)*Rr[i, m]*Rr[k, m]*b*add(PI[i, j, q, p]*PI[k, j, q, r]*tau[p](t)*tau[r](t), q = 1 .. N)/((2*m+1)*(2*j+1)*a), i = 0 .. II), k = 0 .. II), m = 0 .. II), j = 0 .. JJ), p = 1 .. M), r = 1 .. M) end if end proc:

y3 := CodeTools:-Usage([seq(UP1(s, PI, N, M, a, b, II, JJ), s = 1 .. 2)]):

memory used=0.88GiB, alloc change=256.00MiB, cpu time=7.14s, real time=6.58s, gc time=877.89ms

 

fnormal(y1b-y1, Digits-1);

[0., -0.]

(3)

fnormal(y1a-y1, Digits-1);

[0., 0.]

(4)

fnormal(y3-y1, Digits-1);

[0., -0.]

(5)

fnormal(y3[1]-(4/3)*y3[2], Digits-1);

-0.

(6)

``

Download soal-1_ac.mw

 

As exciting as a 2-fold speedup for a sequence of 2 operations goes, this might be a good time to mention that it's often much more important to optimize the serial code first.

For example, if the computation amounts to just adding up pairwise produces of multiples of tau[i](t)*tau[j](t) then perhaps the coefficients could be abstracted away from the symbolic factors, and a purely floating-point scheme devised (which has the potential to be much faster).

What are you trying to accomplish (in words, please).  What do you mean to express by A1 = b where A1 is a 3x3 Matrix and b is a Vector?

@Cavemaann Your latest attachment had the expression f which simplified to -sinh(alpha*L)*sin(alpha*L) = 0 . Are you saying you disagree with roots of that occuring at alpha*L=n*Pi?

Or are you saying that you want some other kind of plot for f=0? And if so, then what kind of plot?

Your example works for me.

display3d_ac.mw

What is inside your custom initialization of Maple? What happens if you launch without it?

@Cavemaann Aren't the roots just multiples of Pi?

K4square_equal_K2square_ac.mw

@tomleslie This is perhaps interesting to someone generally concerned with low- vs high-level parallelization in Maple.

But I fear that it risks muddying the waters for the OP, since the addition of many numeric values, or other arithmetic calls, is not the primary concern. (AFAICT the member vv cooked up the addition of many sin calls just because it is a black box which he could easily make as expensive as wanted.) 

I suspect that there could also be non-negligible differences in comparison of such a variety of approaches according to whether the calculation is exact rational, software floating-point, or hardware floating-point.

The OP (as per usual, and unhelpfully) has not disclosed details of the actual computations performed by his function calls f(x1,x2,x3,...) and g(x1,x2,x3,...). So we cannot tell if they would be susceptible to improvement via low-level multi-threading. We also cannot tell if they are even thread-safe, or call anything which is thread-blocking (like an external, compiled, numeric integration function). In the worse cases, a lack of thread-safety could result in invalid results or even a crash.

I think that your reponse can remind us of an aspect that is sometimes overlooked: In Maple it is often the case that the performance of a serial calculation can be significantly improved, and often it is beneficial to focus on optimizing the serial calculation foremost, before trying to parallelize.

(Hence my suggestion above, to use the Grid package.)

 

@vv You may compare with Grid:-Seq.

You could also wrap the f(i) calls in evalhf and remove the evalf. Remember table of evalf might play some part.

Also possible is Threads:-Task model

nb. I suspect you deliberately use :-add instead of Threads:-Add because you want an expensive serial `f`. Makes sense.

General control of other worksheets (from within a given worksheet) is not possible.

There are limited tools for querying particular formulas from another Worksheet/Document (if the formula is attached to a GUI Equation Label). And there are limited tools for getting a "return value" from another Worksheet.

But you cannot generally set the state of another worksheet.

I am going to stress that it is unhelpful for you to not describe carefully what your goal is here.

If you simply want to be able to share/get procedures/operators/expressions amongst sessions then Library archives might suffice. Its difficult to know whether you are asking for significantly more than that, because you've given so few details.

First 237 238 239 240 241 242 243 Last Page 239 of 594