The Kolmogorov-Smirnov test is a widespread, simple, and effective test to check the hypotheses of the form H[0]:=F[ksi](x)=F(x), where a function F[ksi](x) is the CDF of a population distribution, a function F(x) is a given continuous function (the Kolmogorov test), and the hypotheses of the form H[0]:=F[1](x)=F[2](x), where F[j](x), j=1,2, are the CDF of two population distributions, both are assumed to be continuous (the Smirnov test). See the article ( http://en.wikipedia.org/wiki/Kolmogorov_test ) in Wiki for more details.

By the way, Mathematica 8 includes this test.

We begin from the Kolmogorov test. We find D[n]:=sqrt(n)*max{|F[n](x)-F(x)|:-infinity < x < infinity}, where F[n](x) is the ?ProbabilityFunction of RandomVariable(?EmpiricalDistribution(S)) of a sample S having the size n. Next we compare D[n] with the root t[0.95]=1.3580 of the equation K(t)=0.95:

> fsolve(K(t) = .95, t = 0 .. 2);

1.358098639

(as usually, the significance level 0.05 is put),

where K :=t -> piecewise(0 < t, sum((-1)^j*exp(-2*j^2*t^2), j = -infinity .. infinity), 0) , see the output of

>plot(K, -1 .. 2, thickness = 3)

If D[n]<=t[0.95] the null hypothesis is accepted and if D[n]>t[0.95] the one should be rejected. It is quite simple to program with Maple.

For example, let us consider a sample of the size 100 from the ?NormalDistribution with the parameters a:=1 and sigma:=2 :

> with(Statistics):

> X := RandomVariable(Normal(1, 4));

_R

> S := Sample(X, 10^2);

Vector(4, {(1) = ` 1 .. 100 `*Vector[row], (2) = `Data Type: `*float[8],

(3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})

> CDF(X, t);# its ?CDF

1/2+ 1/2* erf( sqrt(2)*(t - 1) /8)

> S[20];

-2.79600701124852424

Then we store S by rank:

> R := Rank(S):

> B := OrderByRank(S, R);

Vector(4, {(1) = ` 1 .. 100 `*Vector[row], (2) = `Data Type: `*float[8],

(3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})

> B[20];

-2.85106767863987986

We put F(t) in such a way:

>F := unapply(1/2+(1/2)*erf((1/8*(t-.9))*sqrt(2.1)), t);

t-> 1/2+(1/2)*erf(.1811422094*t-.1630279885)

Now we calculate D[n] :

>Y := RandomVariable(EmpiricalDistribution(B)):

>C := map(t-> abs(CDF(Y, t)-F(t)),B);

Vector(4, {(1) = ` 1 .. 100 `*Vector[row], (2) = `Data Type: `*anything,

(3) = `Storage: `*rectangular, (4) = `Order: `*Fortran_order})

> max(C);

0.0899305682

See the picture, created by

>plot([t-> CDF(Y, t),t-> F(t)], color = [red, blue], thickness = 2);

> evalf(sqrt(10^2)*max(C));

0.8993056820

We draw the conclusion that the null hypothesis should be accepted.

Let us turn to the Smirnov test. In this case D[n1,n2]:=sqrt(n1*n2/(n1+n2))*max{|G[1](x)-G[2](x)|:-infinity < x < infinity}, where G[j](x), j=1, 2, are the empirical distributions of a sample S[1] with the size n1 and a sample S[2] with the size n2 correspondingly.

For example,

> Z1 := RandomVariable(Uniform(0, .95));

_R1

> n1 := 10^3: S1 := Sample(Z1, n1):

>R1 := Rank(S1):

>T1 := OrderByRank(S1, R1):

> Z2 := RandomVariable(Uniform(0, 1));

_R2

> n2 := 2*10^3: S2 := Sample(Z2, n2):

>R2 := Rank(S2):

>T2 := OrderByRank(S2, R2):

> W1 := RandomVariable(EmpiricalDistribution(T1)):

> W2 := RandomVariable(EmpiricalDistribution(T2)):

> C1 := map(t)-> abs(CDF(W1, t)-CDF(W2, t)), T1):

> max(C1);

0.0570000000

> C2 := map(t-> abs(CDF(W1, t)-CDF(W2, t)), T2):

> max(C2);

0.0565000000

> max(max(C1), max(C2));

0.0570000000

> evalf(sqrt(n1*n2/(n1+n2))*%);

1.471733671

Because 1.471733671 > 1.358098639, we draw the conclusion that the null hypothesis should be rejected.

See Kolmogorov-Smirno.mw

PS. Sorry for the edit in the last lines. It's only those who do nothing that make no mistakes.