A serious problem with Statistics:-Sample()

Ask a Question

Create a Post

Question:A serious problem with Statistics:-Sample()

Posted: mmcdara 7896 Product: Maple

November 27 2019

Hi,
The choice method=default (the implicit choice) gives dramatically false results as demonstrated here for the sampling of a normal r.v.
It's likely to be the case for any continuous distribution.
In a few words the tails of the distribution are not correctly sampled (far from the mean by 4 standard deviations for a normal r.v., not an extremely rare event in practical applications).

I think the attached file is sufficiently documented for you to understand the problem.

restart:

The problem presented here also exists with Maple 2018 and Maple 2019

>	interface(version)

(1)

>	with(Statistics):

>	X := RandomVariable(Normal(0, 1)):

Generate a sample of X of size 10^6

Here is the R code for those who would like to verify the results and the performances

N <- 10^6

R <- 20

Q <- c(0:5)

M <- matrix(nrow=R, ncol=length(Q))

for (r in c(1:20)) {

S <- rnorm(N, mean=0, sd=1)

for (q in Q) {

M[r, q+1] <-length(S[S>q])

}

Expected.Number.of.Outliers <- floor(N - pnorm(Q, mean=0, sd=1) * N)

Observed.Number.of.Outliers <- round(colMeans(M))

Expected.Number.of.Outliers

Observed.Number.of.Outliers

Absolute.Differences <- M - kronecker(t(Expected.Number.of.Outliers), rep(1, R))

boxplot(differences)

Remark the loop below takes about 30 seconds to run (to be compared to the 20 minutes
it would take am I used the build-in procedure Select, and to the 3 seconds R demands).

>	N := 10^6: R := 20: Q := [$0..5]: nQ := numelems(Q): M := Matrix(R, nQ, 0): for r from 1 to R do S := Sample(X, N): for q in Q do M[r, q+1] := add(Heaviside~(S -~ q)): end do: end do:

>	Expected_Number_Of_Outliers := Vector[row](nQ, i -> Probability(X > Q[i], numeric)*N);

$Expected_Number_Of_Outliers := Vector[row](6, {(1) = HFloat(500000.0), (2) = HFloat(158655.25393145697), (3) = HFloat(22750.13194817921), (4) = HFloat(1349.8980316301036), (5) = HFloat(31.67124183311998), (6) = HFloat(0.2866515719235352)})$

(2)

>	Observed_Number_Of_Outliers := Mean(M);

$Observed_Number_Of_Outliers := Vector[row](6, {(1) = 500291.4, (2) = 158782.3, (3) = 22761.35, (4) = 1349.5, (5) = 51.6, (6) = 1.9}, datatype = float[8])$

(3)

While the values of the Observed_Number_of_Outliers seem to be reasonably correct for q =0, 1, 2 and 3
there are highly suspicious for q = 4 and likely for q = 5 (but N is too small to be categorical)

So let's examine more closely rhe results for q = 4

>	`q=4`:= convert(M[..,5], list); min(`q=4`)

(4)

This means we obtained 20 estimations out of 20 of the probability that X exceeds q=4.
There are only about on chance out of 1 million to get such a result (roughly (1/2)^20 ).

Here is the result R returned for the same quantile q=4

M[,5]

[1] 24 37 23 38 30 43 23 30 25 39 25 29 29 36 39 41 35 37 36 34

(9 values out of 20 are less than the expected value of 31.67)

A very simple test based on the Binomial distribution will prove, without any doubt, that the
distribution of the 20 numbers of outliers for q = 4 implies the sampling algorithm is of
is of very poor quality.

Is it possible to obtain correct results with Maple ?

(simulations done below for the case q=4 only)

>	N := 10^6: R := 20: Menv := Vector[column](R, 0): for r from 1 to R do S := Sample(X, N, method=envelope): Menv[r] := add(Heaviside~(S -~ 4)): end do:

>	`q=4`:= convert(Menv, list); Mean(Menv); Variance(Menv);

(5)

Conclusion, the "default" sampling strategy suffers strong flaw.

Remark: it's well known that one of the best method to sample normal random variables
is Box-Mueller's: why it has not been programmed by default ?

Download Bug_Normal_Sampler.mw

163 views
More...

Contact Author

Subscribe

Generate PDF

Answer

Reply

Share

Branch

Create a new Post in reply

Ask a related Question

Flag

Answer this Question Add a Comment

Share via:

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

E-Mail Address:
Password:
Remember Me:	Automatically sign in on future visits

Ask a Question

Create a Post

Question: A serious problem with Statistics:-Sample()

Question:A serious problem with Statistics:-Sample()

163 views
More...

Contact Author

Subscribe

Generate PDF

Answer

Reply

Share

Branch

Create a new Post in reply

Ask a related Question

Flag

Preview

Preview

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

Save this setting as your default sorting preference?
Note: You can change your preference any time in your account settings
Don't show this again

From:
To:

Custom Message (optional):

Ask a Question

Create a Post

Question: A serious problem with Statistics:-Sample()

Question:A serious problem with Statistics:-Sample()

163 views More... Contact Author Subscribe Generate PDF Answer Reply Share Branch Create a new Post in reply Ask a related Question Flag

Preview

Preview

Save this setting as your default sorting preference?

Ask a Question

Create a Post

Generating PDF…

163 views
More...

Contact Author

Subscribe

Generate PDF

Answer

Reply

Share

Branch

Create a new Post in reply

Ask a related Question

Flag