:

## Using Tasks to filter combinations

Maple 17

This blog post is a response to a post on MaplePrimes.  MaplePrimes user wkehowski asked how the Task Model could be used to filter combinations.  The basic problem is formalated like this:  We want a function, I'll call it FilterComb, that accepts positive integers n and k, and a boolean valued function (I'll refer to it as a filter function) that takes a combination as an argument.  FilterComb returns a list of the combinations of size k selected from 1..n such that the filter function returns true. Here is a sequential implementation of FilterComb:

`FilterComb := proc( n::posint, k::posint, filter::appliable, \$ )    local matching, perm;    perm := combinat:-firstcomb( n, k );    matching := table();    while ( perm <> FAIL )    do        if ( filter( perm ) ) then            matching[ perm ] := 1;        end;        perm := combinat:-nextcomb( perm, n );    end;    [ indices( matching, nolist ) ];end;`

The commenters recognized the basic problem, the combinat:-nextcomb function allows one to generate each combination sequentially.  However generating the combinations sequentially is going to be problematic for a parallel implementation.

One commentor provided a Threads:-Create based solution that uses the producer/consumer pattern.  In this case, one thread produces a block of combinations and other threads consume those blocks by applying the filter to each element.  This is a fairly classic way to parallelize a problem like this, however it has some limitations.  In particular, the time to produce the combinations must be much less than the time to consume them, otherwise your consumer threads won't have enough work to stay busy, that is, they'll consume the work faster than the producer can generate it.  For our problem it comes down to the complexity of the filter function, if the filter function is fast, then the parallelism is going to suffer.  This model can also be problematic with respect to scaling.  As you add more consumer threads, the producer thread will eventually not be able to keep up.  Finally, the implementation of the producer/consumer can be quite complex.  You can see the code here.

For my implementation I am going to attack the basic problem directly, we need to parallelize the combination generation.  To do this we split the set of elements into two sets.  One set, the prefix, is elements that are fixed in the combinations to be generated and the other set, the rest, are the elements that need to be combined with prefix.  Initially prefix will be the empty set and rest will be all elements.  A task will generate and filter the combinations using a prefix set and a  rest set.  We can make this division recursive by creating new prefix sets by combining one element from rest with the current prefix.  Each of these new prefix/rest pairs corresponds to a new task.  We need to be a bit careful here, when an element i is moved from rest to prefix, we need to make sure that we remove all the elements less that or equal to i from rest.  This makes sure we don't create duplicate combinations.  For example, if we have (prefix,rest) =  ({1,2},{3,4,5,6}), then we can create new pairs ({1,2,3},{4,5,6}), (1,2,4),{5,6}) and ({1,2,5},{6}).

The makeArgs function implements the combination splitting, with the splitCombs implementing the high level parallelism.  In this case I use the Threads:-Map function instead of explicitly using the Task model as the splitting works nicely on top of the Map functionality.

`FilterCombPara := module( )    local ModuleApply,            makeArgs,            splitCombs,            filterCombs;    makeArgs := proc( i::integer, prefix::set(posint), rest::set(posint),                                                k::posint, filter::appliable, \$ )        ( prefix union {i}, remove( x->x<=i, rest ), k-1, filter );    end;    splitCombs := proc( prefix::set(posint), rest::set(posint), k::posint,                                                                filter::appliable, \$ )        if ( numelems( rest ) < k ) then            NULL;        elif ( k = 1 ) then            filterCombs( prefix, rest, filter );        else            op( Threads:-Map( x->splitCombs( makeArgs( x, prefix, rest, k, filter ) ), rest ) );        end;    end;    filterCombs := proc( prefix::set(posint), rest::set(posint), filter::appliable, \$ )        local matching, comb, n, current, i;                n := numelems( rest );        matching := table();        for i in rest        do            current := prefix union { i };            if ( filter( current ) ) then                matching[ current ] := 1;            end;        end;        indices( matching, nolist );    end;    ModuleApply := proc( n::posint, k::posint, filter::appliable, \$ )        [ splitCombs( {}, {seq( i,i=1..n )}, k, filter ) ];    end;end:`

I've attached a worksheet (FilterComb.mw) containing this code so that you can run it yourself.  Run on a quad core i7 I get the following results:

`> r1 := CodeTools:-Usage( FilterComb( 60, 5, evenFilter ) ):memory used=9.29GiB, alloc change=0 bytes, cpu time=110.64s, real time=104.39s> r2 := CodeTools:-Usage( FilterCombPara( 60, 5, evenFilter ) ):memory used=4.33GiB, alloc change=1.09MiB, cpu time=60.08s, real time=22.29s> evalb( { op( r1 ) } = { op( r2 ) } );                                              true`

The parallel version is actually running more that 4 times faster.  I suspect this is due to the parallel version being more memory effecient.

Darin

-- Kernel Developer Maplesoft ﻿