epostma

1364 Reputation

17 Badges

12 years, 305 days
Maplesoft

Social Networks and Content at Maplesoft.com

Maple Application Center
I am the manager of the Mathematical Software Group, working mostly on the Maple library. I have been working at Maplesoft since 2007, mostly on the Statistics and Units packages and on contract work with industry users. My background is in abstract algebra, in which I completed a PhD at Eindhoven University of Technology. During my studies I was always searching for interesting questions at the crossroads of math and computer science. When I came to Canada in 2007, I had nothing but a work permit, some contacts at Maplesoft (whom I had met at a computer algebra conference a year earlier), and a plan to travel around beautiful Canada for a few months. Since Maplesoft is a company that solves more interesting math and computer science related questions than just about anywhere else, I was quite eager to join them, and after about three months, I could start.

MaplePrimes Activity


These are replies submitted by epostma

@Carl Love I'll put adding an implementation of selectremove, and selectremove on my backlog. No promises about when it'll be done - it might require some kernel work to make it possible to provide implementations of this (I haven't checked).

Erik

Hi Bill,

Thanks for letting us know about this issue. We were already aware of it, and we're working with our data provider, Quandl, to get this working again as soon as possible. (The fix should be server side only, as far as we can see.)

What appears to happen is that a data cap is incorrectly applied at some point - so you may see it working at some times, and not working at other times, without any real dependency on the Maple version you're using (as Christopher2222 already pointed out).

I'll post an update when the issue is resolved.

Erik Postma
Manager of the mathematical software group

Another option would be to use the Student[MultivariateCalculus] package:

with(Student[MultivariateCalculus]):
L := Line((x-2)/3=(y-1)/4, (y-1)/4=(z-3)/3);
GetPlot(L);

This shows a point on the line and the direction vector as well, and a caption. If you prefer to not see (some of) those, you can use options to the GetPlot command to disable them.

This happens because Sample tries to use evalhf to determine the probabilities for all numbers in a reasonable range. What ends up happening is, roughly:

with(Statistics):
r := RandomVariable(NegativeBinomial(3, 0.1));
p := MakeProcedure(ProbabilityFunction, r, numeric);
numbers := Array(0 .. 169, i -> i, datatype=float);
map[inplace,evalhf](p, numbers);

This requires evaluating evalhf(p(169)), which doesn't work: there's an overflow, because GAMMA(172) cannot be stored in an IEEE double variable. In theory, you could simplify GAMMA(3+t)/t! to a polynomial and just evaluate that instead, but that won't work for cases where the first parameter is not an integer (which we allow). Instead, I'm testing a fix that tests whether the sum of the resulting probabilities in the Array 'numbers' is close to 1, and if it isn't, retry without 'evalhf'. Unfortunately, it looks like it may be a while until this fix is in a released version of the product.

I will add that another workaround is using the option 'method = custom' in the sample call: Sample(r, 10000, method=custom). (The default for the negative binomial distribution is the discrete method.) For more details, see the help page for Statistics[Sample].

Erik Postma
Math group manager, Maplesoft.

As a final note, as I was investigating this I figured that there was something fishy going on with either the documentation of the GaussMarkovProcess, or the implementation of it. Looking at the literature about Ornstein-Uhlenbeck processes, I think I understand it now. It looks like the documentation is incorrect about the SDE used for GaussMarkovProcess instances: it should be dX = (g + h*X)dt + sigma*dW, rather than dX = h*(g-X)*dt + sigma*dW. This, at least, is a change I can easily find time for!

The Euler scheme expression above should therefore be

g(t)*dt + h(t)*X*dt + sigma(t)*sqrt(dt)*W

rather than what it says above. The "unbiased scheme" expression is consistent with what's in the code.

Erik.

@itsme Hmmm... it looks like I'll need to eat my words in this case!

I was trying to add this info to our help pages, but then found there is a way you can control the integration scheme that's used, in an indirect way. Many of the process creation procedures have a 'scheme' option, which has two values: Euler and unbiased. The default value is Euler, and in that case, integration proceeds as I described above. However, the unbiased case is more complicated.

I looked at the GaussMarkovProcess case to find out what exactly is going on. I haven't completely figured out why it works yet, but here is the short version. If you select 'scheme = unbiased', then Maple uses its own integration methods (as implemented in evalf/Int)  to do as much of the integration as possible behind the scenes. Concretely, here's what happens:

The SDE for this class of process is dX = h * (g - X) * dt + sigma * dW. Define the following auxiliary functions:

lambda(t, t + dt) = exp(int(h(z), z = t .. t+dt))

mu(t, t+dt) = int(exp(int(h(u, u = z .. t + dt)) * g(z), z = t .. t + dt)

s(t, t+dt) = sqrt(int(exp(2*int(h(u), u=z..t+dt))*sigma(z)^2, z=t..t+dt))

Integration proceeds *technically* via the Euler-Maruyama scheme in all cases, with a callback from QuantLib. QuantLib asks the process what the change in X should be as time progresses from t to t+dt; this delta is added to the current value of X. If the Euler scheme is selected, then (unsurprisingly) Maple returns

h(t) * g(t) * dt - h(t) * X * dt + sigma(t) * sqrt(dt) * W,

where W is a standard normally distributed random variable. On the other hand, if the unbiased scheme is selected, Maple returns:

(lambda(t, t+dt) - 1) * X + mu(t, t+dt) + s(t, t+dt)*W.

This means that for every time step, Maple needs to compute a number of (deterministic) numerical integrals (through evalf/Int). (Note that this happens only once for each new process-time grid combination; if you create many sample paths, the integrals for each time point are computed only once.) The time steps and integration scheme used for these integrations cannot be set manually.

I haven't figured out yet why, exactly, the second expression is a valid thing to return, but (at the cost of being many times slower) it appears to give excellent results.

As for documenting this, it's now become a much larger task. I'll see if I can schedule this in somewhere...

Best

Erik.

@itsme You're right - I'll try to add it for a future version of Maple.

Erik.

@Mac Dude

Yes, in order for the code to work correctly you have to provide all inflection points. The code should not care about additional points (actually, any additional points, whether or not the 2nd derivative is 0).

I think you're correct that the code is less efficient than a simpler solution if the PDF changes for every invocation. If efficiency is very important, it might be possible to use some of the ideas there to get a tight fitting envelope in your situation, though I think it depends on being able to describe the inflection points relatively easily.

Recalling writing those posts, I think I just didn't consider non-normalized PDFs. The examples I had in mind were all easy to normalize so it just didn't come up. (In the context of the Statistics package, you would definitely need to normalize the PDF - there, everything relies on it being normalized.)

With regards to your PS: I don't know of an easy way to find such posts. I expect most of them would be "featured posts" when they are written, but there's no easy way to find them after the fact. Another favourite of mine is Joe Riel's post about Cartesian products of lists, at http://www.mapleprimes.com/posts/41838-Cartesian-Products-Of-Lists. I reread it once every year or so :)

Erik Postma
Maplesoft.

Hi Mac Dude,

First off, I'm glad to hear you enjoyed my post. Here are a couple of notes in reply to your ideas and suggestions:

  1. The process you arrived at looks sound, pretty efficient, and correct to me. That's what matters most!
  2. Most of the time spent in generating the 600 points you show, is undoubtedly spent in finding the inflection points of the PDF. This is a step that's needed to make the particular implementation of Maple's envelope generation work. This implementation is explained in a series of four mapleprimes posts I posted four years ago. Finding the inflection points is done in pure Maple code, and it can be slow for complicated PDFs; once that's done, the actual generation of points is done in external C code, it requires very few PDF evaluations, and it's quite fast.
  3. The technique from the end of the Sample help page where one leaves a parameter unassigned when calling Sample, and then sets it before calling the resulting procedure, is not efficient for use with the envelope rejection generator: the envelope needs to be recomputed for each invocation anyway, so there's no noticeable benefit to having the sampling procedure pre-computed. (For builtin distributions, it is efficient, because those have pre-written C code for generation of random numbers without substantial set up, and it's just a matter of sending the correct value in.)
  4. If you know something about the conditional PDF for y given the value of x, that tells you where one can find its inflection points (as a function of x), then you can adapt the code given in part four of the series linked above to possibly speed the random generation up a bit more. As always, the question is, how much do you need the extra speed vs. how much time would it cost to write this adapted version.

Let me know if you have more questions or remarks.

Erik Postma
Maplesoft.

Thanks for your question, @Majmaj.

The reason that Maple uses the formula that it uses, is the same reason that it divides by N-1, rather than by N, to obtain the variance: this is the formula that gives you an unbiased estimate of the distribution variance -- at least, if the distribution that your values come from is normal. This is really the only formula that makes sense for the general case: where the weights can be any positive numbers. (Consider, for example, if your weights are 1/10, 1/9, 1/8; then using sum(w[i])-1 does not make sense.)

You could argue, as @CarlLove does below, that Maple should detect the case where all weights are integers (or the weights sum to a number greater than 1, or some other condition) and in that case use a different formula - the formula that assumes that weights are just repetitions. I don't agree with this proposal - I think using formulas with different outcomes, in a manner that is controlled by the values of the arguments, and not even in a continuous manner, is just bad practice; it would make the behaviour of Maple much less predictable than it is currently.

If you want to view weights as repetitions in the Statistics package, then it's best to do the expansion yourself: if your data set and the numbers of repetitions are both given as lists, you can do this nicely as follows:

L := [3,2,2];
W := [7,5,8];
LW := zip(`$`, L, W);
Statistics:-StandardDeviation(LW); # returns 0.489360...

Hope this helps,

Erik Postma
Maplesoft.

Note that both ?copy and ?LinearAlgebra:-Copy will work. They do almost exactly the same thing if their argument is a Matrix or Vector.

Erik Postma
Maplesoft.

Note that both ?copy and ?LinearAlgebra:-Copy will work. They do almost exactly the same thing if their argument is a Matrix or Vector.

Erik Postma
Maplesoft.

Hi herclau,

If you call ?randomize with a fixed argument, you always set the random generator to the same state, so you will always get the same matrices. That's its purpose.

If you want to get different results, it's probably best to just not restart in between subsequent runs. Alternatively, you can omit the argument in the call to randomize(); that will set the random seed to something based on the current internal clock time of your computer (though I believe it only uses a granularity of seconds, so don't do it in, say, a loop - in which case you're doing something wrong anyway, because for obvious reasons you can't restart from within a loop.)

HTH,

Erik Postma
Maplesoft.

Hi herclau,

If you call ?randomize with a fixed argument, you always set the random generator to the same state, so you will always get the same matrices. That's its purpose.

If you want to get different results, it's probably best to just not restart in between subsequent runs. Alternatively, you can omit the argument in the call to randomize(); that will set the random seed to something based on the current internal clock time of your computer (though I believe it only uses a granularity of seconds, so don't do it in, say, a loop - in which case you're doing something wrong anyway, because for obvious reasons you can't restart from within a loop.)

HTH,

Erik Postma
Maplesoft.

Hi Daniel / SamuelTuvare,

Could you show us the details of what you have done so far?

Erik Postma
Maplesoft.

1 2 3 4 5 6 7 Last Page 2 of 20