Hi,

This post is inspired by a recent question maple least square fit error... where the OP was simulating what appeared to be a stochastic process known as the **Drunkard's walk **(see for instance** **The_Drunkard's_Walk).

In the case of the PO, the drunk took a step forward or a step backward (say along a narrow, long corridor) with equal probabilities. In addition one assumes that the step the drunkhard takes is **independent** of all the steps he did before.

His move is what is called a (1D) Random_walk

This little application based on MAPLETS (ok, I know that some people see them as old-fashioned technology).

It draws a sequence of several drunkard's walks, all of identical number of steps, and interactively plots the current histogram of the arrival point (the point where he is at the end of his walk -which should be the door if the same pub he started from if he is an inveterate drunkhard or if he knows a little about statistics- ).

The code contains 2 procedures :

- f_step_by_step (n, Discrete=false/true)

n : number of steps

Discrete = false (default value) plot the histogram of the arrivals point as if these points were realizarions of a continuous random variable

In this simple model these arrivals can take only integer values between -n and +n included; the Discrete=true option is **recommended** but it takes more walks for it to converge to the asymptotic distribution **(see below)**.

Once launched, f_step_by_step opens a maplet containing a Plotter and 2 buttons. A first walk is displayed, clic the "Plot" button to draw another and repeat the operation as many times as you want.

- f_automatic (n, m, Discrete=false/true)

d and Discrete both have the same meaning than for f_step_by_step.

m is the number of random walk you want to draw

The code is set to draw 1000 walks of 1000 steps ; this correspong roughly to 250 Mb of memory used.

f_automatic contains a call to Threads:-Sleep to delay the display, the argument of Sleep is set to 0.25 second and must be modified within the procedure (its value could be passed to f_automatic as an argument).

In my opinion it is the more interesting of the two procedures.

The values of the current mean and standard deviation are displayed as title.

The purpose is before all educative and can be seen as an illustration of the (one of) Central Limit Theorem(s) (**CLT**)

__A little bit of theory:__

Let X[n] the position of the drunkhard after n steps; his position X[n+1] is either X[n]-1 or X[n]+1 with equal probabilities.

The displacement X[n+1]-X[n] is a discrete random variable S with outcomes -1 and +1 and it's easy to find its variance is equal to 1.

The position of the trunkard after n steps is just a realization of the n independant and identically distributed, random variables S_{1}, ...S_{n} whose distribibution is equal to the one of S.

Thus :

- Expectation (S
_{1 }+ ... +S_{n}) = 0
- Variance (S
_{1 }+ ... +S_{n}) = n

For n=1000 steps, the standard deviation of the arrivals is about 31.6)

**CLT **says that the distribution of S_{1 }+ ... +S_{n }tends to a Gaussian distribution as n tends to infinity.

**What is the exact distribution of the arrivals?**

Another way to represent S is to write S = 2*B-1 where B is a Bernoulli random variable with parameter 1/2. The random variable "Arrival" is twice the sum of N indpendent rabdom variables such like S

and thus its distribution is **2*Binomial(N, 1/2)-N**.

**What is the rate of convergence of the histogram to the true probability function?**

For a sample of size N drawn from a continuous random variables, its histogram has:

- a bias error of order 1/K (K being the number of bins)
- a L
_{infinity} error of order K*sqrt( log(K) / M ) (M number of drunkhard's walks)

see for instance Lec2_density.pdf

Using the option Discrete=true corresponds to the choice K=2*N+1, in this case the choice with the highest L_{infinity} error (the larger K the smaller the bias but the larger the L_{infinity} error).

This is the reason I introduced the possibility to graw histograms and bar (column) graphs: for the same value of M the L_{infinity} error of the histogram (for instance with the default number of bins Maple uses) is nuch slower than the one of the comumn graph.

__A few "internal" parameters.__

I already spoke about the delay to display txo succesive walks.

Other parameters could be:

- The value of minbins in the case discrete=false (default)

This value is fixed to 2*sqrt(M).

- The width in the "view" option of the plot: it's left part displays the drunkhard's walk and it's right one the histogram of the arrivals (after a rotation of -Pi/2).

This value si fixed to 5/4*M.

Note that the histogram is dynamically rescaled in order it's height is always 1/4*number_of_steps.

- The height of the view option is set to -Q..Q where Q is equal to 4 standard deviations of the theoritical distribution of the arrivals.

**The continuous envelope of this distribution is red plotted in red (its height is normalized to M/4, see above). **

One can show this standard deviation iverifies sqrt(M).

In my opinion using a full vertical scale (-M..M) doesn't give pretty drunkward's walkes because they seem to more concentrated around the value 0.

__WATCHOUT__

**Starting from 0 any walk with an odd number of steps will give an odd arrival and any walk with an even number of steps will give an even arrival. Thus the exact number of outcomes for the arrivals are:**

**2*n if n is odd**
**2*n-1 if n is even**

__Other application__

This maplet can be used as illustration of the Galton Board (also known as the Bean_machine)

__Why using maplets?__

Another solution could have been to use **animate**. But to draw the M drunkard's walks, you would have had to use M frames. An excessive task that I'm not even sure Maple would have been able to handle.

I guess that imbeded components could do the job too, but I'm not as comfortable with them as I am with maplets.

To illustrate what the code does an image of the final result is given below.

Drunkard_walk.mw