Question: Guidelines on how to use Grid:-Seq

Hello

I wonder if someone on the list can give me some guidelines on how to use Grid:-Seq.

I have applied Grid:-Set and Grid:-Seq to one of my problems where parallelization is a possibility.   In this problem, I have a massive list of lists that needs to be processed in chunks and then a new list is returned.

What I've found so far:

1) Grid-Seq won't work if I use a procedure from a private module that calls local (or exported) functions of the same module.  Solution:  I need to use savelib (there is a bug in Grid:-Seq)

2) Grid-Seqs seems to be working fine up to a point where it simply stops.  For example:   Starting with 431895 sets, the procedure divides them into 864 sets of 500 elements (the last one does not usually contain 500 elements).  Then I use res:=Grid:-Seq['tasksize'=9](myFunction(newSets[i],...,....),i=1..864).   The messages sent out by Grid:-Seq are 

Seq: using tasksize=9; number of elements= 864, number of partitions = 96

......

Seq: done sending all partitions

Seq: received results from all nodes; collating results

After two hours running and using 36 cores, the result is available in res.  The next step is to use the results in res for another cycle of calculations.  1503462 sets = 3007 sets of 500 elements.  The messages sent out by Grid:Seq are 

Seq: using tasksize=9; number of elements= 3007, number of partitions = 335

......

Seq: done sending all partitions

The last message showed up on the screen after an hour. During the process, I saw the current tasks finishing and new ones starting.  However I could not see "Seq: received results from all nodes; collating result" after a day running.  I am using timelimit to be sure that all calculations will finish no matter what.  

I have checked the system information and found that: 1) all 72 cores are running but they are jumping from 0% to 100% (in the example that works I could see all of them in 100% solid), 2) Memory is at 156 GB (200 Gb is the limit) and no swap to disk.

I have also noticed:

  1. If I use Maple 2017, not even the first example works.
  2. tasksize=9 somehow helps with similar problems.  If the size is left for Maple to decide, it seems that the problem happens for sets of smaller size.
  3. On linux I had to use "kill -9" to remove maple from every single core. My impression is that Grid:-Seq does not clean up the processes after they finished.  
  4.  The problem seems to be on "results from all nodes; collating result".

 

I know that the explanation above in rather vague, but if someone has any clue or guidelines on how to solve this problem, please share it with me.

 

Many thanks

 

Ed

 

 

 

Please Wait...