Consider the following three problems:
1) given a list [a,b,c,d,e], return the list [a=1,b=2,c=3,d=4,e=5]
2) given a nested list [a, [b, [c, [d, [e]]]]] return the list [a,b,c,d,e]
3) given an integer in base 10, compute it's base b representation
First I will show you what not to do:
L := [a,b,c,d,e]; M := []; for i from 1 to nops(L) do M := [op(M), L[i]=i]; end do;
Building up lists (and sets) incrementally is quadratic time, because each iteration of the loop allocates linear storage to hold the new list. The standard solution is a loop with a temporary variable, assigning to a table:
M := table(): for i to nops(L) do M[i] := L[i]=i; end do: [seq(M[i], i=1..nops(L))];
However for large lists this is slow, because you are accessing the elements of L incrementally.
If you time it, you will see that seq(L[i], i=1..nops(L)) is much slower than seq(i, i=L) when L is a large list. Thus we need to do one linear pass through the list L. We can modify the loop as follows to acheive the desired result:
M := table(): j := 1: for i in L do M[j] := i=j; j := j+1; end do: [seq(M[i], i=1..nops(L))];
The code above is asymptotically faster than all previous versions, however it still requires a table and random accesses to construct the result. Can't we just somehow use seq ?
The answer is "yes", and the trick is to use the assign command: assign('x', v) assigns v to x and returns NULL. Here is the loop above, using one seq and two local variables i and j:
L := [a,b,c,d,e];
j := 1;
[seq(i=`+`(j, assign('j', j+1)), i=L)];
Notice that I had to add the NULL to j in order to make it all fit in a seq. If this fails, for example if j is a modp1 polynomial, then you can do op([j, assign('j', j+1)]) instead which is slower but more general.
Now lets look at the second problem. Nested lists are interesting because they are one way to implement stacks in Maple. The underlying lists [a, [...]] store a pointer to an object and a pointer to a substack. I don't know whether you actually want to use nested lists for stacks (you certainly don't want to display them), but the problems we face here pop up in other settings, for example you may need to extract linked list data from external code. Again the standard solution is a loop and table:
L := [a, [b, [c, [d, [e]]]]]; M := table(): i := 1: for i while nops(L) > 1 do M[i] := L[1]; L := L[2]; end do: M[i] := L[1]: [seq(M[j], j=1..i)];
The point is that after we extract the leading element we assign the sublist to L. Here it is with a seq:
L := [a, [b, [c, [d, [e]]]]];
L := [seq(`+`(L[1], assign('L', L[2])), i=1..4), L[1]];
Of course we cheated - how does one know ahead of time how many elements are in L ? The range for seq must be pre-specified and can not be changed, so the only solution for now is to guess and overshoot. Perhaps the Maple kernel developers will consider adding a seqwhile command in the future :) We will use an `if` statement to detect the end of the stack and return NULL.
L := [a, [b, [c, [d, [e]]]]];
L := [seq(`if`(L=[], NULL, `+`(L[1], `if`(nops(L)=1, assign('L',[]), assign('L', L[2])))), i=1..10)];
This does too many comparisons and is slow. You can speed it up by using an explicit delimiter, where the last list element is [e, []]:
L := [a, [b, [c, [d, [e, []]]]]];
L := [seq(`if`(L=[], NULL, `+`(L[1], assign('L', L[2]))), i=1..10)];
While we're on the topic, here's how to build that list with a seq. Note that the order must be reversed because we are building a stack.
L := [e,d,c,b,a]; # input reversed
j := []:
seq(assign('j', [i, j]), i=L): # outputs nothing
j; # result
Now for the final problem, which was the original motivation. I want to compute the base b representation of an integer with the sign as the first element. Normally one would use `convert/base` but in my case it was too slow. I had thousands of integers to convert. My solution was based on the following:
b := 11:
L := [4387324, -9879862, 4237654, -123489321];
# bound the number of base b digits required
k := trunc(evalf(ln(max(seq(abs(i), i=L)))/ln(b)))+1;
[seq([sign(i), assign('t',abs(i)), seq(irem(t, b, 't'), j=1..k)], i=L)];
# here is the code I replaced:
[seq([sign(i), op(convert(abs(i), base, b))], i=L)];
Note that the ceil function is very slow and should be avoided, along with floor, frac, and round. The irem trick is used by `convert/base` as well, however if you time the two methods on large lists you should find that my method is faster because there is much less overhead. Also, if your ultimate goal is to get a matrix of integers, you can use the (faster) rtable constructor instead of the Matrix command because the sublists are all the same size. In my case I was even able to compute the bound k must faster (using ilog2 and iquo) because my base was 2^32.
There are many other applications of this "assign in a sequence" technique, and I would be interested to hear what people might have tried.
Comments
Disagreement on relative speeds ...
I'm curious as to how you ascertained the relative speed of these techniques---they do not agree with either my intuition or my timing. Actually, I've only looked at the first problem. Following is the code I used to time the four techniques, and a fifth method, which is far and away the fastest of the lot. While method 1 (incrementally building up a list) is certainly a loser, I did not expect method four, using assign, to be particularly fast, considering that assign is not a builtin. It wasn't in my timings. Here they are
num1 := proc(L) local M,i; M := []; for i from 1 to nops(L) do M := [op(M), L[i]=i]; end do; M end proc: num2 := proc(L) local M,i; M := table(): for i to nops(L) do M[i] := L[i]=i; end do: [seq(M[i], i=1..nops(L))]; end proc: num3 := proc(L) local M,i,j; M := table(): j := 1: for i in L do M[j] := i=j; j := j+1; end do: [seq(M[i], i=1..nops(L))]; end proc: num4 := proc(L) local i,j; j := 1; [seq(i=`+`(j, assign('j', j+1)), i=L)]; end proc: num5 := proc(L) local i; [seq](L[i]=i,i=1..nops(L)): end proc: Time := proc(num,L,N) local st; st := time(); to N do num(L); end do; printf("%A = %A\n", num, time() -st) end proc: L := [$1..100000]: N := 1: #Time(num1, L, N); Time(num2, L, N); Time(num3, L, N); Time(num4, L, N); Time(num5, L, N); num2 = .930 num3 = 1.210 num4 = 2.360 num5 = .260I timed it in my own code
I timed it in my own code, where I was building lists of polynomials by extracting terms one by one from external code and updating multiple pointers using external calls. The assign technique had better average-case performance, probably because it generated less garbage. It would have also helped that I was already building a product, so the `+`() call (`*` in my case) was not extraneous. These examples were really just to illustrate the technique, which can be useful when it fits your situation. I appreciate your timings however.
memory as a factor in performance
I am curious why there is interest shown here in speed performance but not in memory performance. For example, how do the various techniques differ in terms of how much extra memory they allocate? I find that Maple performance discussions tend in general to have an emphasis about speed of execution. Isn't a performance metric which combines both speed and memory allocation just as valuable, and perhaps more valuable in general?
I have seen a few different performance metrics commonly used in the past. One is time_duration*allocation_increase, and another is time_duration*(allocation_increase^2).
I prefer the technique Roman used here, to have restarts between comparisons of various methods. For, even if one is not measuring memory allocation as part of the performance metric, there can be a penalty for garbage collection which changes with ongoing differences in memory allocation. So it can happen that the order of tests affects how they perform. Putting distinct tests into distinct Maple runs can alleviate such obscuring interference. (As we know, it was not always true as it is in Maple 10 that memory would be completely freed upon restart. So multiple sessions for distinct tests is cleaner all round.)
acer
Memory allocation
Joe Riel's timings for the first example don't really change when you restart Maple in between. Your general point however is important: memory allocation can significantly affect the performance of procedures which operate on a lot of data. This is especially true because Maple garbage collects, which is O(allocated memory), and does not free memory, which is a questionable design. Routines which create a lot of garbage will, generally speaking, degrade the performance of the system.
In this case my approach, while faster, actually generates about twice as much garbage compared to the table-based approach (probably due to procedure calls). I consider this a worthwhile tradeoff because the amount of memory we are talking about is small - a few MB perhaps. In a large problem with hundreds of MB allocated, odds are that the system will reassign garbage collected memory and there will be no new allocation in either case. That equals time saved.
[seq]
Joe, as we both know,
[seq](...)is slower than[seq(...)]. In particular,num6 := proc(L) local i; [seq(L[i]=i,i=1..nops(L))]: end proc:is faster than
num5. I just wanted to mention about that for people learning High Performance technique from this book.making [a=1, b=2, c=3, and so on]
Clarification on performance
Edited to note: The post below is wrong, and Joe Riel's times are correct. Maple fully evaluates global expressions, causing the for loop method to run an order of magnitude slower in the test below than it does in a procedure. The principle of assigning to variables in a seq is a valid one, however to get any performance benefits it seems necessary to have your own procedure do it (as in the irem example below). You will probably never get good performance using the 'assign' command.
In light of Joe Riel's timings, I think I need to clarify where this technique is useful and where it is not. The method benefits from the fact that a seq is faster than an explicit loop with multiple statements, but it suffers from the overhead of a Maple procedure call in each iteration. The question is: how many variables are needed to describe the "state" of your loop ? If the answer is "more than one" then you can probably derive a benefit from this technique. The second problem (nested lists) illustrates this pretty well:
restart; L := [seq(i,i=1..5000)]: TIMER := time(): L1 := []: for i in L do L1 := [i, L1] end do: TIMER1, TIMER := time()-TIMER, time(): for i to nops(L) do M[i] := L1[1]; L1 := L1[2] end do: TIMER1, time()-TIMER; restart; L := [seq(i,i=1..5000)]: TIMER := time(): L1 := []: seq(assign('L1', [i,L1]), i=L): TIMER1, TIMER := time()-TIMER, time(): L1 := [seq([L1[1], assign('L1', L1[2])][1], i=1..nops(L))]: TIMER1, time()-TIMER;Of course you can get even more improvement by writing your own procedure instead of using Maple's assign, which as Joe pointed out, is not built-in (sadly).
ListTools
For the second problem here is the timings:
> f:=proc(k) if k=0 then return [] else return [k,f(k-1)] end if; end proc;
f := proc(k)
if k = 0 then return [] else return [k, f(k - 1)] end if
end proc
> A:=f(10000):
> time(ListTools:-Flatten(A));
1.910
> restart;
bytes used=410457848, alloc=327620, time=3.92
> f:=proc(k) if k=0 then return [] els\
> e return [k,f(k-1)] end if; end proc;
f := proc(k)
if k = 0 then return [] else return [k, f(k - 1)] end if
end proc
> A:=f(10000):
> time([seq(`if`(A=[], NULL, `+`(A[1], assign('A', A[2]))), i=1..10000)]);
18.230