Hi you two,

Axel,

- The choice of triangular storage was driven by watching the size of the private working set of the maple code expand. Memory usage suggests that gc() is pick up pieces, or the expansion would be an explosion. If I can get my code to allocate the array storage once, the triangular stuff will come out.
- I did consider implementing Kahan's formula with floats or hfloats, but thought I would try some extended precision first. I am open to considering algorithms on Kahan's error bounds, if they aren't too expensive to implement. I would welcome your suggestions.
- I will look into resize -- I seem to remember it wanting rectangular storage.
- I am gradually feeling your pain about 2D math, although I find reading the 2D code much easier when I'm debugging.

Acer,

- Your surmise that I am decideing whether summation is "necessary" is absolutely correct. This application involves summations that exhibit catastrophic cancellation, making rounding visible in significant digits. I've been manipulating the terms of the expressions to absorb the cancellation in algebra before the terms are evaluated in fp, and using logarithmic representations of intermediate variables to make better use of the mantissa space. Rounding error is part of the floating point model, and am not trying to escape it; I guess it would be more accurate to say that I wanted to understand the effect of summation methods so I could observe errors stemming from catatrophic cancellation, and try to distnguish them from round-off errors -- if there are not great discrepancies between the results from adding sorted and unsorted sums, then I think both sorting and additional algebraic manipulation will be unnecessary.
- I completely agree that 11K Arrays are no big deal. When I'm convinced the algorithm is right and storage management can handle it, the dimensions are likely increase by a factor of 10^3 - 10^5 in each dimension. And there probably a hundreds of them. Do you by chance know whether those external BLAS calls are thread-safe?
- With regard to (f/F)loat: I have a proclivity for ignoring capitalization, so your observation about Float vs. float was not at all on my radar. Parallelizability is not the only respect in which my code is embarrassing.
- Thank you for the alising code illustration. That might be just the thing.

I'll start spend some quality time with the code later this week, and will let you know how it goes.

With much thanks,

- Jimmy

As a workaround, I splilt my 1-dimensional Array into one Array for negative values, and a second Array for positive values, sorted them separately, and then performed the addition in a loop by selecting the addend with less absolute value from the two at the head of each Array. Not pretty, but it got the job done.

In a few-thousand entry matrix test, AddAlongDimensions seems to be doing a nice job of controlling for accumulation of rounding error. The difference between my sorted sum and AddAlongDimensions was beneath the least significant digit of the total.

I still don't understand what sort didn't like, but if AddAlongDimension performs this well I'll skip it and focus on memory consumption. The hfloat question remains open for now. If the cost isn't too high it would be great to use it so that I can use Task:-Threads package or perhaps CUDA GPU to operate on matrices in parallel -- it's truly an embarassingly parallelizable computation.

In the process I found another curiosity: evalf(abs(float(1,infinity))) will not evaluate to float(1,infinity), although evalf(abs(infinity) yields infinity. Anyone know why? This wouldn't account for the sort() problem, since the example I uploaded earlier today demonstrates the same problem without any infinities in sight.

I think my memory problems will probably bite me, so I'll be back.

Trying to help my own cause,

- Jimmy

It looks like the problem was a pointer arithmetic error that found an intresting way of expressing itself. Sorry I didn't wait another day before posting. Have a great week!

- Jimmy

For the benefit of anyone else out there with the same need to run Maple 11.02 in Ubuntu 10.4, I did not find the compatability library on the ubuntu site ( apparently many applications were broken when it was removed from Ubuntu in karmic koala release 9.10, see https://bugs.launchpad.net/ubuntu/+source/ia32-libs/+bug/431091 ).

However, comment #17 on this same page contains instructions for accessing a package which contains the needed library. Just remember that you want the package libstdc++5, and NOT lib32stdc++5 despite the latter's description as being configured for use on a amd64 or ia64 Debian system running a 64-bit kernel. I tried that one first, and could not get Maple to run.

I leave the AMD64 server to its computing, a tip of the hat to Roman, and a fine day or night to all of you.

- Jimmy

Okay, I got off my good intentions and experimented with the Optimization package, which did a nice job. I'll plug it into the app and see how it performs. Still curious if there are global controls that could have controlled infnorm, though.

Sorry to bother you prematurely, wizards.

- Jimmy

Interesting. I had done the unapply to define the function in the previous line, and named the function in the evalf/Int along with the variable and range of integration. Embedding the apply in the evalf/Int did the trick.

I can't thank you enough!

- Jimmy

Hi Axel and Fred,
Sorry to be slow in replying -- I've been digesting your helpful thoughts. In parallel with analysis of the integral, I am been working on the problem definition, and have reframed the 2D integral as shown on the bottom of the page (N.B., the exponents outside the radical are not dependent on the ones inside the radical). This form appears to be easier, but I'm not at all sure it would finish if I ask for more than 12 digits. As you can see, the original integral is now multipled factor, and I have added a couple of new asymmetric terms (see below), so I had to juggle a little more to get the whole expression symmetric.
The zeros in the denominator were very troublesome, they motiviated the time spent in my problem domain to see if I can reformulate the problem so they don't occur. I'll share more about that as soon as I've published; please bear with me.
I should have answered Axel's question about dimensionality earlier: The problem is not trivial in two dimensions, so although I could certainly see extending it to higher dimensions, a 2D approach is a very nice place to start.
I like Axel's second transformation too, because the original limits of integration are bounded above and below in 0..1, so treating them as zero is not a painful approximation for large x.
Earlier tonight I generated an interpolating Geddes-Newton series for my whole integrand (which I state below, in a form prior to making it symmetric and stretching it over the unit square as recommended). Having watched this integral grind and fail in Maple's current integrators, I was quite impressed with the results
Accuracy, # terms in Series, time (seconds)
1e-9, 12, 5.3
1e-10, 14, 8.5
1e-11, 15, 10.141
1e-12, 18, 17.437
As mentioned in Carajval's thesis, the actually error seems to be quite a bit smaller, as there is destructive interference in their cumulative error.
To get the series to work with Axel's earlier transformation I did have to constrain the problem a bit to keep eta and xi out of the denominator. When I tried for accuracy greater than 1e-12, the GN tensor generator went beserk -- probably having a hard time finding the infinity norm on the main diagonal of the symmetrized function. I also tried to generate a GN interpolating series in the original problem, with the high exponents under the radical. The almost-singularity at (1,1) made it keep trying to use (1,1) as a splitting point.
I have not yet tried Axel's latest idea -- I need to digest it, and it's getting late here on the west coast.
Thanks again for your help, guys. I'll keep chewing, and let know know how things are going.
- Jimmy
Here is my current integral:
int(int(10030*(.2060666007254+.5000000000001*chi[1]^100+.5000000000001*chi[2]^1411-.2060666007254*sqrt(1.+4.8528*chi[2]^1411+4.8528*chi[1]^100+5.88741696*chi[2]^2822+5.88741696*chi[1]^200-21.48043392*chi[1]^100*chi[2]^1411))*chi[1]^1002*chi[2]^9, chi[1] = 0 .. .9999), chi[2] = 0 .. .9999)