This blog post is a response to a post on MaplePrimes. MaplePrimes user wkehowski asked how the Task Model could be used to filter combinations. The basic problem is formalated like this: We want a function, I'll call it FilterComb, that accepts positive integers n and k, and a boolean valued function (I'll refer to it as a filter function) that takes a combination as an argument. FilterComb returns a list of the combinations of size k selected from 1..n such that the filter function returns true. Here is a sequential implementation of FilterComb:

FilterComb := proc( n::posint, k::posint, filter::appliable, $ )

local matching, perm;

perm := combinat:-firstcomb( n, k );

matching := table();

while ( perm <> FAIL )

do

if ( filter( perm ) ) then

matching[ perm ] := 1;

end;

perm := combinat:-nextcomb( perm, n );

end;

[ indices( matching, nolist ) ];

end;

The commenters recognized the basic problem, the combinat:-nextcomb function allows one to generate each combination *sequentially*. However generating the combinations sequentially is going to be problematic for a parallel implementation.

One commentor provided a Threads:-Create based solution that uses the producer/consumer pattern. In this case, one thread produces a block of combinations and other threads consume those blocks by applying the filter to each element. This is a fairly classic way to parallelize a problem like this, however it has some limitations. In particular, the time to produce the combinations must be much less than the time to consume them, otherwise your consumer threads won't have enough work to stay busy, that is, they'll consume the work faster than the producer can generate it. For our problem it comes down to the complexity of the filter function, if the filter function is fast, then the parallelism is going to suffer. This model can also be problematic with respect to scaling. As you add more consumer threads, the producer thread will eventually not be able to keep up. Finally, the implementation of the producer/consumer can be quite complex. You can see the code here.

For my implementation I am going to attack the basic problem directly, we need to parallelize the combination generation. To do this we split the set of elements into two sets. One set, the **prefix**, is elements that are fixed in the combinations to be generated and the other set, the **rest**, are the elements that need to be combined with prefix. Initially **prefix** will be the empty set and **rest** will be all elements. A task will generate and filter the combinations using a **prefix** set and a **rest** set. We can make this division recursive by creating new **prefix** sets by combining one element from **rest** with the current **prefix**. Each of these new **prefix**/**rest** pairs corresponds to a new task. We need to be a bit careful here, when an element **i **is moved from **rest** to **prefix**, we need to make sure that we remove all the elements less that or equal to **i** from **rest**. This makes sure we don't create duplicate combinations. For example, if we have (**prefix**,**rest**) = ({1,2},{3,4,5,6}), then we can create new pairs ({1,2,3},{4,5,6}), (1,2,4),{5,6}) and ({1,2,5},{6}).

The **makeArgs** function implements the combination splitting, with the **splitCombs** implementing the high level parallelism. In this case I use the Threads:-Map function instead of explicitly using the Task model as the splitting works nicely on top of the Map functionality.

FilterCombPara := module( )

local ModuleApply,

makeArgs,

splitCombs,

filterCombs;

makeArgs := proc( i::integer, prefix::set(posint), rest::set(posint),

k::posint, filter::appliable, $ )

( prefix union {i}, remove( x->x<=i, rest ), k-1, filter );

end;

splitCombs := proc( prefix::set(posint), rest::set(posint), k::posint,

filter::appliable, $ )

if ( numelems( rest ) < k ) then

NULL;

elif ( k = 1 ) then

filterCombs( prefix, rest, filter );

else

op( Threads:-Map( x->splitCombs( makeArgs( x, prefix, rest, k, filter ) ), rest ) );

end;

end;

filterCombs := proc( prefix::set(posint), rest::set(posint), filter::appliable, $ )

local matching, comb, n, current, i;

n := numelems( rest );

matching := table();

for i in rest

do

current := prefix union { i };

if ( filter( current ) ) then

matching[ current ] := 1;

end;

end;

indices( matching, nolist );

end;

ModuleApply := proc( n::posint, k::posint, filter::appliable, $ )

[ splitCombs( {}, {seq( i,i=1..n )}, k, filter ) ];

end;

end:

I've attached a worksheet (FilterComb.mw) containing this code so that you can run it yourself. Run on a quad core i7 I get the following results:

> r1 := CodeTools:-Usage( FilterComb( 60, 5, evenFilter ) ):

memory used=9.29GiB, alloc change=0 bytes, cpu time=110.64s, real time=104.39s

> r2 := CodeTools:-Usage( FilterCombPara( 60, 5, evenFilter ) ):

memory used=4.33GiB, alloc change=1.09MiB, cpu time=60.08s, real time=22.29s

> evalb( { op( r1 ) } = { op( r2 ) } );

true

The parallel version is actually running more that 4 times faster. I suspect this is due to the parallel version being more memory effecient.

Darin

-- Kernel Developer Maplesoft