This is the third post in a four-part series; the earlier posts are Generating Samples from Custom Probability Distributions (I) and Generating Samples from Custom Probability Distributions (II), and the last post is at Generating Samples from Custom Probability Distributions (IV). We ended the last post with the question of how we could further improve the algorithm demonstrated there.

The trick to doing better is simply finding a better  and a better . The form of the function that we selected for Maple is a piecewise linear function for both  and . Initially, the support of  is subdivided into regions where  is convex and regions where it is concave; in our case, these regions can be found as follows:





On ,  is convex, whereas on  it is concave. Now on each of these intervals, we construct the two tangent lines to the graph on the left hand and right hand side of the interval. These form a lower bound for a convex function and an upper bound for a concave function. So we can use the maximum of the two as a lower bound and the minimum of the two as an upper bound. These maxima and minima are piecewise linear functions, separated at the intersection of the two. For the remaining bounds (the upper bound for convex functions and the lower bound for concave functions) we just take chords of the graph on the intervals defined by the intersections.

Let us find the intersections first.



We will consider four intervals, defined by the points ; as we found above, on the first two of these intervals,  is convex, and on the last two it is concave.


We can now define the chords.


This allows us to define the upper and lower bounds as follows.



Generating values for  (defined by ) has become somewhat more complicated now. Since the graph of  consists of four linear pieces, we can first select which of these pieces the next sample value is going to come from, and then select a value within that piece. The probability of selecting a particular piece should be proportional to the area of the quadrangle under that piece:



The subsequent areas are:


To turn these areas into probabilities, we scale them so that their sum is 1.





We see that piece 3 (the dark blue piece in the plot above) occurs the most, as expected.

Once the piece has been selected, we need to select a value according to the linear PDF for that piece. This can be done as follows. We first divide each piece into a rectangle and a triangle.



Now again with probabilities proportional to their relative areas, we choose whether the sample point is going to come from the rectangle or from the triangle.



If the value should come from the triangle, then we can use the Triangular distribution again; for the rectangle we use the Uniform distribution.




We now generate samples from each of the four areas, one sample point for each element of the tally, and then proceed to perform the accept/reject strategy.







Now almost 90% of the points are accepted without evaluating , and only 5% are rejected. And in principle we could get an even finer approximation by subdividing the intervals even more: for the two points that we found as intersections of two tangents, we could draw an additional tangent there, draw more chords, etc. But when do we stop subdividing the intervals? And it seems that most of the red points occur in that triangle on the right hand side; so we would like to do this refining process mostly there - but how do we know this before we have the result?

The answer to these questions will be in the fourth and final post in this series; the result will be to get the fraction of accepted points to be greater than 99%.


Please Wait...