## Diminishing Returns from Parallel Processing:...

by: Maple 2016

This post is about the relationship between the number of processors used in parallel processing with the Threads package and the resultant real times and cpu times for a computation.

In the worksheet below, I perform the same computation using each possible number of processors on my machine, one thru eight. The computation is adding a list of 32 million pre-selected random integers. The real times and cpu times are collected from each run, and these are analyzed with a variety of metrics that I devised. Note that garbage-collection (gc) time is not an issue in the timings; as you can see below, the gc times are zero throughout.

My conclusion is that there are severely diminishing returns as the number of processors increases. There is a major benefit in going from one processor to two; there is a not-as-great-but-still-substantial benefit in going from two processors to four. But the real-time reduction in going from four processors to eight is very small compared to the substantial increase in resource consumption.

Please discuss the relevance of my six metrics, the soundness of my test technique, and how the presentation could be better. If you have a computer capable of running more than eight threads, please modify and run my worksheet on it.

Diminishing Returns from Parallel Processing: Is it worth using more than four processors with Threads?

Author: Carl J Love, 2016-July-30

Run the tests

restart:

kernelopts(numcpus= 1):
currentdir(kernelopts(homedir)):

memory used=0.79MiB, alloc change=0 bytes, cpu time=2.66s, real time=2.66s, gc time=0ns

memory used=0.78MiB, alloc change=0 bytes, cpu time=2.26s, real time=2.26s, gc time=0ns

 Repeat above test using numcpus= 2..8. restart: kernelopts(numcpus= 2): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.79MiB, alloc change=2.19MiB, cpu time=2.73s, real time=1.65s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=2.37s, real time=1.28s, gc time=0ns   restart: kernelopts(numcpus= 3): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.79MiB, alloc change=4.38MiB, cpu time=2.98s, real time=1.38s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=2.75s, real time=1.05s, gc time=0ns   restart: kernelopts(numcpus= 4): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.80MiB, alloc change=6.56MiB, cpu time=3.76s, real time=1.38s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=3.26s, real time=959.75ms, gc time=0ns   restart: kernelopts(numcpus= 5): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.80MiB, alloc change=8.75MiB, cpu time=4.12s, real time=1.30s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=3.74s, real time=910.88ms, gc time=0ns   restart: kernelopts(numcpus= 6): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.81MiB, alloc change=10.94MiB, cpu time=4.59s, real time=1.26s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=4.29s, real time=894.00ms, gc time=0ns   restart: kernelopts(numcpus= 7): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.81MiB, alloc change=13.12MiB, cpu time=5.08s, real time=1.26s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=4.63s, real time=879.00ms, gc time=0ns   restart: kernelopts(numcpus= 8): currentdir(kernelopts(homedir)): read "ThreadsTest.mpl": memory used=0.82MiB, alloc change=15.31MiB, cpu time=5.08s, real time=1.25s, gc time=0ns memory used=0.78MiB, alloc change=0 bytes, cpu time=4.69s, real time=845.75ms, gc time=0ns

Analyze the data

restart:

currentdir(kernelopts(homedir)):

(R,C):= 'Vector(kernelopts(numcpus))' \$ 2:
N:= Vector(kernelopts(numcpus), i-> i):

while not feof(fd) do
(n,Tr,Tc):= fscanf(fd, "%m%m%m\n")[];
(R[n],C[n]):= (Tr,Tc)
end do:

fclose(fd):

plot(
(V-> <N | 100*~V>)~([R /~ max(R), C /~ max(C)]),
title= "Raw timing data (normalized)",
legend= ["real", "CPU"],
labels= [`number of processors\n`, `%  of  max`],
labeldirections= [HORIZONTAL,VERTICAL],
view= [DEFAULT, 0..100]
);

The metrics:

R[1] /~ R /~ N:          Gain: The gain from parallelism expressed as a percentage of the theoretical maximum gain given the number of processors

C /~ R /~ N:               Evenness: How evenly the task is distributed among the processors

1 -~ C[1] /~ C:           Overhead: The percentage of extra resource consumption due to parallelism

R /~ R[1]:                   Reduction: The percentage reduction in real time

1 -~ R[2..] /~ R[..-2]:  Marginal Reduction: Percentage reduction in real time by using one more processor

C[2..] /~ C[..-2] -~ 1:  Marginal Consumption: Percentage increase in resource consumption by using one more processor

plot(
[
(V-> <N | 100*~V>)~([
R[1]/~R/~N,             #gain from parallelism
C/~R/~N,                #how evenly distributed
R/~R[1]                 #reduction
])[],
(V-> <N[2..] -~ .5 | 100*~V>)~([
1 -~ R[2..]/~R[..-2],   #marginal reduction rate
C[2..]/~C[..-2] -~ 1    #marginal consumption rate
])[]
],
legend= typeset~([
'r[1]/r/n',
'c/r/n',
'1 - c[1]/c',
'r/r[1]',
'1 - `Delta__%`(r)',
'`Delta__%`(c) - 1'
]),
linestyle= ["solid"\$4, "dash"\$2], thickness= 2,
title= "Efficiency metrics\n", titlefont= [HELVETICA,BOLD,16],
labels= [`number of processors\n`, `% change`], labelfont= [TIMES,ITALIC,14],
labeldirections= [HORIZONTAL,VERTICAL],
caption= "\nr = real time,  c = CPU time,  n = # of processors",
size= combinat:-fibonacci~([16,15]),
gridlines
);

## how to compUTE CPU time in RK45 dsolve command?...

DEAR SIR

ANYONE CAN HELP TO COMPUTE TIME IN DSOLVE COMMAND?

 >
 >
 >
 >
 (1)
 >
 (2)
 >
 (3)
 >
 (4)
 >
 (5)
 >
 >
 >
 (6)
 >
 >
 (7)
 >

## Mapleprimes timing is off

For reference this question was posted exactly 9:05

## Timing seems longer on 16 than 12...

A simple timing program

st:=time():
for i to 35 do
i;
end do;
time()-st;

On M16 seems to display after 7 seconds, the time reads 0.010
On M12 it displays in roughly 2 seconds with the time as 0.170

It is physically longer with M16.  Do others get the same results?