Suppose you want to solve a large dense linear system AX=B over the rationals - what should you do? Well, one thing you should probably not do is directly apply Gaussian elimination. It does O(n^3) arithmetic operations, but the size of the numbers blow up, leading to an exponential bit complexity. Don't believe me? Try it:
with(LinearAlgebra):
for N from 5 to 9 do
A := RandomMatrix(2^N, 2^N+1,generator=-10^5..10^5):
TIMER := time(GaussianElimination(A));
print(2^N = TIMER);
end do:
32 = 0.312
64 = 3.640
128 = 57.855
So that algorithm's not any good. How about a modular method? We can solve the system modulo some primes and reconstruct the answer using Chinese remaindering. This is implemented in the LinearAlgebra:-Modular package and is available from LinearSolve using method=modular.
with(LinearAlgebra):
for N from 5 to 9 do
A := RandomMatrix(2^N, 2^N+1,generator=-10^5..10^5):
TIMER := time(Modular:-IntegerLinearSolve(A,2^N));
print(2^N = TIMER);
end do:
32 = 0.052
64 = 0.152
128 = 0.944
256 = 8.412
512 = 155.469
You can see we do quite a bit better this time, and the time on small systems is very good. But as the systems get larger the performance starts to trail off. Why? Each prime is an O(n^3) computation, but the result is blowing up. We require more and more primes as the systems get larger. The numbers in the result have O(n (log n + log |A|)) bits, so the total complexity is about O(n^4).
Using vintage 1982 technology, we can do considerably better. Dixon's p-adic method computes the inverse of A modulo one prime p. It then computes X0 = Ainv*B mod p, B1 = (B - A*X0)/p, X1 = Ainv*B1 mod p, B2 = (B1 - A*X1)/p, ... Then X0 + X1*p + X2*p^2 + ... Xk*p^k is the solution modulo p^k. After sufficiently many iterations, we can recover the exact solution using rational reconstruction. Each iteration is O(n^2 log|A|), so the total cost is about O(n^3).
You can download my implementation of Dixon's method here: dpadic.mpl. It uses a compiled integer dot product routine that was added to Maple 12. As you can see from the times below, the complexity is actually a bit better than O(n^3) because asymptotically fast algorithms start to come into play.
read "dpadic.mpl":
infolevel[solve] := 0: # don't print junk
with(LinearAlgebra):
for N from 5 to 10 do
A := RandomMatrix(2^N, 2^N+1,generator=-10^5..10^5):
TIMER := time(DensePadicLift(A,2^N));
print(2^N = TIMER);
end do:
32 = 0.088
64 = 0.208
128 = 0.776
256 = 4.132
512 = 26.245
1024 = 186.091
The syntax for this command is the same as LinearAlgebra:-Modular:-IntegerLinearSolve. You can actually replace IntegerLinearSolve with this routine and it will be used automatically by SolveTools:-Linear and LinearAlgebra:-LinearSolve in Maple 12.
If anyone has LinBox or Magma installed on the same machine I would welcome a benchmark comparison. I don't expect the Maple routine to win, but it might not be bad for large systems.
Most of the difficulty in writing this routine was taken care of by the RowEchelonTransform command in LinearAlgebra:-Modular. This command is called to row reduce the matrix mod p, identify the independent rows and columns, and construct the inverse of the "leading block". As a result, the solver works quite well for singular matrices where a basis for the nullspace is computed.
Comments
smaller solutions
I thought I would post a few more times for linear systems with smaller entries. The systems above have large solutions, ie: the numerators and denominators of the last solution have 20000 bits, or about 6000 decimal digits each. That's a lot of lifting steps.
These times are all from a Pentium 4 3.2 GHz, computer running 32-bit Maple 12. The matrix entries range from -5 to 5. The numerators and denominators of the last system have about 4000 decimal digits. On structured systems (where the solution is expected to be small) you could obviously get much better performance.
read "dpadic.mpl": infolevel[solve] := 0: with(LinearAlgebra): for N from 8 to 11 do A := RandomMatrix(2^N, 2^N+1,generator=-5..5): TIMER := time(DensePadicLift(A,2^N)); print(2^N = TIMER); end do: 256 = 1.276 512 = 7.772 1024 = 54.499 2048 = 496.655