Mac Dude

1576 Reputation

17 Badges

14 years, 120 days

MaplePrimes Activity


These are questions asked by Mac Dude


I recently got access to a 12-core Intel MacPro with 64 GB main memory. This motivated me to try parallel programming again, even though in prior years I have never been successfull. The project I am trying to do is particle tracking through a circular accelerator, an embarassingly parallel problem in the sense that you can track n particles in parallel through your machine for many turns and then gather up the results for analysis. The function describing the tracking is a 6-component polynomial function acting on 6-vectors and yielding a 6-Vector as a result. Each accelerator component (magnet, drift section, rf, ...) is described by such a function. I am simplifying this a bit here, but in the problem at hand this is what I am doing. The point is that each particle gets treated independently from the others, hence parallelization should be trivial.

Using an existing package (Lattice, which I published with Maple) as framework I set this up such that the tracking proc for n turns of one particle in the accelerator is a member of a module. This module is in the body of a proc and gets returned when the proc is called, essentially instantiating the tracking object and assigning it to a Vector with as many elements as I have particles to track. The tracking function returns a 6-Vector with the coordinates after all turns are complete. 
A separate proc does the instantiation of all tracking objects, gives each one its particle number (from a Beam object it is being given) and sends it off using Threads:-Create. It then waits until every task is done (using Threads:-Wait) and assembles the result in another Beam object which it returns. Please refer to the enclosed Maple worksheet for how it is done.

This actually all appears to work. MacOS is 15.7.2 (Sequoia); Maple is 2023.2. The results are identical both parallel and serial.

The "interesting" result however, is that usage of the available CPU cores saturates at about 4. In the graph shown below, the green line shows the CPU usage of the mserver process, and it saturates between 400 & 500%, actually going down to 360% as more particles get added. 100% is one core, so I am never getting more than about 4 cores to work for me. Correspondingly, the no. of seconds per particle goes up from about 3 s (particle 1 to 4) up to about 15 s/particle, settling at about 10 s/particle as 12 particles are approached. Below is a graph against no. of particles (n) of running time (red), CPU time (dark blue), CPU usage (yellow) and # or mkernel threads (green). 

Bottom line: I am only getting 4 cores out of the 12. Process limits of MacOS (ulimit -a) do not indicate any limit that would cause this (and I have had build jobs that would merrily use all 12 cores).

Is there a limit in Maple that prevents using all available cores? Am I doing something inefficient that could cause this?? This is the first time I actually got parallel operations in Maple to work, so I am happy about that, but my happiness is tempered by not getting it to work at the level I was aiming for. I did google around a bit and found some prior conversations on MP (mostly involving @acer and @Carl Love) about parallel threads which indicated that (a) environment variable OMP_NUM_THREADS should be set and (b) that numcpus can only be set at the very beginning of a Maple session (which I interpret as "right after firing up Maple"). Did both (and verified the settings were in) but no change in behaviour of this code; I only get four cpu cores to work.

Thanks,

Mac Dude

Parallel_tracking_attempt.mw

Edit: Added graph, fixed up graph.

I just installed Maple 2023 on a MacPro running macOS Sequoia 15.7.2. Update to latest .version and activation worked without a hitch.

When I try to run it from the Finder it immediately puts up a dialog saying something like "Java not found". There is a webpage by Maplesoft addressing this, but it is completely uninformative and unhelpful.

I can get Maple to run using the cli (in Terminal) opening the Maple 2023 .app file, at which point Maple (running the standard GUI) works just as one would expect. So it is not a crisis but I'd like to be able to open it through the Finder as well. The Maple .app folder appears to have all the Java stuff in it, and clearly it is somewhere.

Anyone seen and solved this before?

Thanks,

Mac Dude

I have the following expression (result of a calculation):

(1/1296)*cBooP0-(1/1296)*cSRP0-(1/1296)*tStartRamp*f__SR/N

Rather obviously the common factor 1/1296 can be factored out, except I cannot get Maple to factor the 1/1296 without also factoring out N, which I do not want. My desired end result is this:

(1/1296)*(cBooP0-cSRP0-tStartRamp*f__SR/N)

I don't seem to be able to coerce Maple to do this. I can freeze the tStartRamp*f__SR/N term (leaving the 1/1296 unfrozen), but in that case I don't get Maple to pull the 1/1296 factor out at all.

Any hint would be appreciated. I am doing this on Maple 2015. It is really a bit cosmetic, but sometimes I use Maple to write "smart" documentation & then I'd like to end up with a somewhat polished result.

Mac Dude

I am working on a modelling (simulation) program for an rf system involving DDSs, rf mixers and other components. Each component is programmed as a submodule with methods (procs) and other exports, the whole shebang wrapped into a main package loaded using with(). A loop simulates a number of scenarios with differing starting conditions. Most calculations are numeric in nature.

The code runs but is too slow and in addition seems to keep allocating memory as the main loop runs. I suspect these two are related. I am using profile() to get an idea where the time is spent and where the memory gets eaten up.

I am doing the following:

profile(DDS__Boo:-SetTW,rf__Boo:-cycles,rf__SR:-cycles); # profile these three methods (procs):
.
.
.
# the main loop:
# Loop over injection cycles

for cycle from 1 to Cycles do
  tStartRamp:=(cycle-1)/2;
  c__B0:=cB0f(tStartRamp,tStartRamp);
  cB0List[cycle]:=c__B0;
  WBsum:=0;
  DDS__Boo:-ResetTW(tStartRamp);
#
  targetBucket:=targetList[cycle];
  delta__r:=trunc((r__end-0)/(Points-1));

  for ii from 0 to Points-1 do
    r:=ii*delta__r;
    DDS__Boo:-SetTW((r-delta__r/2)/f__clk,eval(W__Boo+ramp));
    WBsum:=evalf(WBsum+eval(ramp)*delta__r);
  end do:


  BooRfPeriods:=evalf(rf__Boo:-cycles((c__ex+targetBucket)/f__SR));
  SRRfPeriodsList[cycle]:=evalf(rf__SR:-cycles((c__ex+targetBucket)/f__SR));
  BooRfPeriodsList[cycle]:=BooRfPeriods-evalf(rf__Boo:-cycles(c__B0/f__SR)); # rf periods cB0 to extr.
  evalf(BooRfPeriodsList[cycle])/432; # Booster turns c__B0 to extr  
  targetErrors[cycle]:=(%-round(%))*432; # targeting error in bucket width
  DocumentTools:-SetProperty(RotaryGauge0,'value',cycle,'refresh');
end do:
# end loop

showprofile();
function           depth    calls     time    time%         bytes   bytes%
---------------------------------------------------------------------------
SetTW                  1    10240  230.877    51.52   22851828384    72.80
cycles                 1      110  108.607    24.24    4269176160    13.60
cycles                 1      110  108.607    24.24    4269176160    13.60
---------------------------------------------------------------------------
total:                 3    10460  448.091   100.00   31390180704   100.00

My question is: how do I interpret the table of results?

"time" is in seconds? (this should really be written in the Help files). Also, SetTW() and cycles() call other routines from the package, are they included in the times reported or not?

"bytes" is my biggest concern. "SetTW" has a whopping 22 GB against it. Since the total allocation goes up to about 290 MB in this particular run (per Maple's info line) this cannot be the total memory used. Can this be the total allocation (and most of it ending up as garbage and cleaned out)?? Even the 4 GB against cycles seems out-of-line.

The two lines for "cycles" are exactly the same even though the routines differ (belong to different instances of the module and operate on differnt parameters). Is profile aware of that, or not?

I realize that people like to see the whole program or worksheet; but given that this is dependent on a large-ish package and the sheet itself is of some size and difficult to penetrate for the uninitiated I want to spare you from trying to read it. Right now I need to understand what showprofile() is reporting. Once I better understand profiling, and if I need more help, I will try to make a MWE, but I am not that far yet. I did check the Programming Guide but I did not find the info I am looking for.

Thanks much for any insight you can share.

Mac Dude.

I need to declare a whole set of variables as local. The variable names are generates algorithmically using assign. Like so:

seq(seq(assign(cat(S,i,j)=Vector(datatype=float)),i=1..9),j=1..9);

Stand-alone, this works and creates all these Vectors for later use. But this:

local seq(seq(assign(cat(S,i,j)=Vector(datatype=float)),i=1..9),j=1..9);

does not work; I get an "error; '(' unexpected".

I really do not want to type all these by hand... on the other hand, if I do not declare these as local I get 99 warnings about implicit local declaration; not nice.

Is there a way to do this?

Thanks,

M.D.

PS: I do not upload as the one line really is all that is needed. At the lowest level one does not get the implicit-declaration warning, but with "local" it still fails.

1 2 3 4 5 6 7 Last Page 1 of 24