## ClusterAnalysis applications: Finding a Minimal...

by: Maple 2017

A project that I have been working on is adding some functionality for Cluster Analysis to Maple (a small part of a much bigger project to increase Maple’s toolkit for exploratory data mining and data analysis). The launch of the MapleCloud package manager gave me a way to share my code for the project as it evolves, providing others with some useful new tools and hopefully gathering feedback (and collaborators) along the way.

At this point, there aren’t a lot of commands in the ClusterAnalysis package, but I have already hit upon several interesting applications. For example, while working on a command for plotting clusters of points, one problem I encountered was how to draw the minimal volume enclosing ellipsoid around a group (or cluster) of points. After doing some research, I stumbled upon Khachiyan’s Algorithm, which related to solving linear programming problems with rational data. The math behind this is definitely interesting, but I’m not going to spend any time on it here. For further reading, you can explore the following:

Khachiyan’s Algorithm had previously been applied in some other languages, but to the best of my knowledge, did not have any Maple implementations. As such, the following code is an implementation of Khachiyan’s Algorithm in 2-D, which could be extended to N-dimensional space rather easily.

This routine accepts an Nx2 dataset and outputs either a plot of the minimum volume enclosing ellipsoid (MVEE) or a list of results as described in the details for the ‘output’ option below.

MVEE( X :: DataSet, optional arguments, additional arguments passed to the plotting command );

The optional arguments are as follows:

• tolerance : realcons;  specifies the convergence criterion
• maxiterations : posint; specifies the maximum number of iterations
• output : {identical(data,plot),list(identical(data,plot))}; specifies the output. If output includes plot, then a plot of the enclosing ellipsoid is returned. If output includes data, then the return includes is a list containing the matrix A, which defines the ellipsoid, the center of the ellipse, and the eigenvalues and eigenvectors that can be used to find the semi-axis coordinates and the angle of rotation, alpha, for the ellipse.
• filled : truefalse; specifies if the returned plot should be filled or not

Code:

#Minimum Volume Enclosing Ellipsoid
MVEE := proc(XY,
{tolerance::positive:= 1e-4}, #Convergence Criterion
{maxiterations::posint := 100},
{output::{identical(data,plot),list(identical(data,plot))} := data},
{filled::truefalse := false}
)

local alpha, evalues, evectors, i, l_error, ldata, ldataext, M, maxvalindex, n, ncols, nrows, p1, semiaxes, stepsize, U, U1, x, X, y;
local A, center, l_output; #Output

if hastype(output, 'list') then
l_output := output;
else
l_output := [output];
end if;

kernelopts(opaquemodules=false):

ldata := Statistics:-PreProcessData(XY, 2, 'copy');

nrows, ncols := upperbound(ldata);
ldataext := Matrix([ldata, Vector[column](nrows, ':-fill' = 1)], 'datatype = float');

if ncols <> 2 then
error "expected 2 columns of data, got %1", ncols;
end if;

l_error := 1;

U := Vector[column](1..nrows, 'fill' = 1/nrows);

##Khachiyan Algorithm##
for n to maxiterations while l_error >= tolerance do

X := LinearAlgebra:-Transpose(ldataext) . LinearAlgebra:-DiagonalMatrix(U) . ldataext;
M := LinearAlgebra:-Diagonal(ldataext . LinearAlgebra:-MatrixInverse(X) . LinearAlgebra:-Transpose(ldataext));
maxvalindex := max[index](map['evalhf', 'inplace'](abs, M));
stepsize := (M[maxvalindex] - ncols - 1)/((ncols + 1) * (M[maxvalindex] - 1));
U1 := (1 - stepsize) * U;
U1[maxvalindex] := U1[maxvalindex] + stepsize;
l_error := LinearAlgebra:-Norm(LinearAlgebra:-DiagonalMatrix(U1 - U));
U := U1;

end do;

A := (1/ncols) * LinearAlgebra:-MatrixInverse(LinearAlgebra:-Transpose(ldata) . LinearAlgebra:-DiagonalMatrix(U) . ldata - (LinearAlgebra:-Transpose(ldata) . U) . LinearAlgebra:-Transpose((LinearAlgebra:-Transpose(ldata) . U)));
center := LinearAlgebra:-Transpose(ldata) . U;
evalues, evectors := LinearAlgebra:-Eigenvectors(A);
evectors := evectors(.., sort[index](1 /~ (sqrt~(Re~(evalues))), `>`, ':-output' = ':-permutation'));
semiaxes := sort(1 /~ (sqrt~(Re~(evalues))), `>`);
alpha := arctan(Re(evectors[2,1]) / Re(evectors[1,1]));

if l_output = [':-data'] then
return A, center, evectors, evalues;
elif has( l_output, ':-plot' ) then
x := t -> center[1] + semiaxes[1] * cos(t) * cos(alpha) - semiaxes[2] * sin(t) * sin(alpha);
y := t -> center[2] + semiaxes[1] * cos(t) * sin(alpha) + semiaxes[2] * sin(t) * cos(alpha);
if filled then
p1 := plots:-display(subs(CURVES=POLYGONS, plot([x(t), y(t), t = 0..2*Pi], ':-transparency' = 0.95, _rest)));
else
p1 := plot([x(t), y(t), t = 0..2*Pi], _rest);
end if;
return p1, `if`( has(l_output, ':-data'), op([A, center, evectors, evalues]), NULL );
end if;

end proc:

You can run this as follows:

M:=Matrix(10,2,rand(0..3)):

plots:-display([MVEE(M,output=plot,filled,transparency=.3),
plots:-pointplot(M, symbol=solidcircle,symbolsize=15)],
size=[0.5,"golden"]);

As it stands, this is not an export from the “work in progress” ClusterAnalysis package – it’s actually just a local procedure used by the ClusterPlot command. However, it seemed like an interesting enough application that it deserved its own post (and potentially even some consideration for inclusion in some future more geometry-specific package). Here’s an example of how this routine is used from ClusterAnalysis:

with(ClusterAnalysis);

X := Import(FileTools:-JoinPath(["datasets/iris.csv"], base = datadir));

kmeans_results := KMeans(X[[`Sepal Length`, `Sepal Width`]],
clusters = 3, epsilon = 1.*10^(-7), initializationmethod = Forgy);

ClusterPlot(kmeans_results, style = ellipse);

The source code for this is stored on GitHub, here:

https://github.com/dskoog/Maple-ClusterAnalysis/blob/master/src/MVEE.mm

If you don’t have a copy of the ClusterAnalysis package, you can install it from the MapleCloud window, or by running:

PackageTools:-Install(5629844458045440);

## generate numeric data from MAPLE equation?...

I have an analytical equation with respect to time that is a Fourier series expansion of a specific function.  I would like MAPLE to generate a table of results against time.  I have always used MATLAB to handle numeric data.  How can I generate a data table in MAPLE.  I have never used the Spreadsheet tool in MAPLE.  Is that the way to go?  Is there some examples on how to do this?

My analytical function is attached:

untitled4.mw

I read from another posting that plottools:-getdata is the way to go, but I do not see that functionality in MAPLE 12?

## How to extract data from implicit function with a ...

I plotted number of functions y=y(x) with different parameter aa for given value of tau (see the code below)

``````restart;
R0:=gamma+2*ln(2)+Re(Psi(1/2+(1+I)*tanh((1+I)*x)/((tau+0.25e-1*a)*y)))+ln(y);
tau := 9.975;
with(plots);
implicitplot({seq(R0, a = 1 .. 50)}, x = 0 .. 5, y = 0 .. 1, numpoints = 1000);
``````

## How to convert an imported data to Vector?...

A very big data was imported by me through

```data := Import("http://fs3.fex.net/get/245716150875/11071260/data.txt");
"3.0994584798345
22.889020258043
....

26.082759642081
42.911810680717
6.4578968130322"

```

I need to convert it to Vector/Array. Now its type is "string":

```whattype(data);
string
```

Here is my unsuccessful attempt:

```convert(data,Vector);
Error, invalid input: `convert/Vector` expects its 1st argument, V, to be of type {Array, Matrix, Vector, array, sequential}, but received 3.0994584798345
22.889020258043
....```

data.mw

## Plotting a graph from many points ...

Hello Guys,

I am having a problem plotting a graph from some sets of point. Please how do I plot a graph having some set of point.

Thanks.

## How do I integrate a large set of datapoints....

Hello!

I got a set of data imported from excel which is of the size 2001x2. I've use DataPlot to plot the graph of this data but I can't seem to find a way to integrate it. I've used BSplineCurve to make the discreate values continious but I cant seem to integrate this new curve. Can someone please give me a solution or an alternative way to find an approximative way to find the area under the curve.

Thanks

Hello everyone!

Could somebody help me with reading data from Huge txt-file? My file contains matrix with 10 columns and 10^6 rows. Datatype is float[8].

I use "ImportMatrix" comand to obtain data from file, but it is rather slow ~ 33 sec.

is there an efficient way to read file directly using Maple? Or I should use C-dll to read file faster?

## Diluting series of values...

Hello everyone !

I have several values loaded into Maple as a matrix from a .txt file (output of my measurement).

I measured power in dependence on time.

Here is an example of my txt file in which the first column represents the time and the second column represents the force:

29.04 997.54
29.06 998.83
29.08 999.79
29.10 1000.76
29.12 1001.72
29.14 1003.01
29.16 1003.81
29.18 1004.94

From 3624 values I need to get specific amount of values, lets say 3000, while keeping the course. We could call it a dilution I guess.

Here is a simple illustration of the problem:

So I am asking you to help me find out some function to dilute measured values while keeping the curve.

I will be grateful for any advice.

## Curve fitting for function and experimental data...

Hi

friend i want fit a curve regarding some data and fnction and how we can find the values of a,b,c and d for the following

f=1-(8*a*b+6*c*d/(b*k*x))/(2*a*b+c*b*(1/(b*k*x)))

X := Vector([200, 210, 220, 230, 240, 250, 260, 270, 280, 290])

Y := Vector([.4172, .3030, .4668, .3317, .1276, .1303, .1733, .1451, .3466, .4125])

## Workbook issues...

Dear Maple users

I have done some experiments with the new Workbook feature in Maple 2016. It is a very welcome addition, indeed. Earlier I have created Maple files in which data from an external Excel file was imported and being used for certain calculations. In order to make recalculations work properly, one need to let the Excel file follow the Maple file. That's where a Workbook come in handy! I tried placing those two files in a Workbook. It didn't work completely as advertised, I think. I moved the Workbook to another location on the harddrive to make sure it wouldn't interfere with the original files outside the Workbook. Then I recalculated the Maple document inside the Workbook. The good thing: The data from the Excel file was still present. The bad thing: If I changed some data in the Excel file inside the Workbook, it didn't register in the Maple file when updating it!

Maybe I should explain that I did import data from the Excel file into Maple via the menu: Tools > Assistants > Import Data... The data was retrieved as a matrix within the Maple file and assigned to a variable and used for plots ...

Why doesn't the above procedure work properly? I hope one don't need to use the Workbook URI to reference files within the workbook. It is not that userfriendly!

Regards,

Erik

## How Can I plot the data result of a program?...

hello!, How can i to plot the next data in a 2D plot?. it is the result of a loop for.... do....while. I had tried with pointplot anda dataplot, but must be something wrong!

with(plots);
for i from 1.73205080756887853 by 0.5e-2 while i < 2.87500000000000000 do roll := i; f1max := NLPSolve(f1, {f2 = i}, x1 = 0 .. 2, x2 = 0 .. 3, method = sqp, maximize = false); c := op([1], f1max), print(c, roll) end do; plot(roll, c);
Warning, limiting number of major iterations has been reached
2.86923108976435204, 1.73205080756887853
2.40562374977021154, 1.737050808
2.36135298525774395, 1.742050808
2.34703023482192563, 1.747050808
2.33762023747274306, 1.752050808
2.33074752039182975, 1.757050808
2.32549269761476607, 1.762050808
2.32138461334482216, 1.767050808
2.31814359109128576, 1.772050808
2.31558754335936889, 1.777050808
2.31359008686494683, 1.782050808
2.31205927196994354, 1.787050808
2.31092573645557353, 1.792050808
2.31013563350152307, 1.797050808
2.30964617400063910, 1.802050808
2.30942269107110354, 1.807050808
2.30943663600124838, 1.812050808
2.30966416809946962, 1.817050808

## How to extract two or three columns from an XLSM f...

I attempt to get two (or better, three) datas from an XLSM file. The tools doesn't work. I want then to do some tests about the apparied datas extracted.

Could you help me ? The best I did was getting a matrix result from an XLS (and not XLSM) file, and I don't know what to do with this kind of result, as I want only test some hypothesis as a linear regression with or without least squares, not do learn what to do with this matrix result..

Thx to you,

Milos

## How to reduce noise using a Fourier Transform on a...

Hello,

I've got a load of data that contains a lot of noise in the form of sinusoidal interference patterns. This is quite thick and it disturbes what I am trying to look at. I've uploaded a picture to represent what I am trying to show. I know I need a cut-off frequency I just don't know how to implement it. Thank you in advance for your help!

Gambia Man

## Aggregate Statistics on DataFrames

by: Maple 2016

Aggregate statistics are calculated by splitting the rows of a DataFrame by each factor in a given column into subsets and computing summary statistics for each of these subsets.

The following is a short example of how the Aggregate command is used to compute aggregate statistics for a DataFrame with housing data:

To begin, we construct a DataFrame with housing data: The first column has number of bedrooms, the second has the area in square feet, the third has price.

`bedrooms := <3, 4, 2, 4, 3, 2, 2, 3, 4, 4, 2, 4, 4, 3, 3>:area := <1130, 1123, 1049, 1527, 907, 580, 878, 1075,          1040, 1295, 1100, 995, 908, 853, 856>:price := <114700, 125200, 81600, 127400, 88500, 59500, 96500, 113300,           104400, 136600, 80100, 128000, 115700, 94700, 89400>:HouseSalesData := DataFrame([bedrooms, area, price], columns = [Bedrooms, Area, Price]);`

Note that the Bedrooms column has three distinct levels: 2, 3, and 4.

`convert(HouseSalesData[Bedrooms], set);`

The following returns the mean of all other columns for each distinct level in the column, Bedrooms:

`Aggregate(HouseSalesData, Bedrooms);`

Adding the columns option controls which columns are returned.

`Aggregate(HouseSalesData, Bedrooms, columns = [Price])`

Additionally, the tally option returns a tally for each of the levels.

`Aggregate(HouseSalesData, Bedrooms, tally)`

The function option allows for the specification of any command that can be applied to a DataSeries. For example, the Statistics:-Median command computes the median for each of the levels of Bedrooms.

`Aggregate(HouseSalesData, Bedrooms, function = Statistics:-Median);`

By default, Aggregate uses the SplitByColumn command to creates a separate sub-DataFrame for every discrete level in the column given by bycolumn.

`with(Statistics);`
`ByRooms := SplitByColumn(HouseSalesData, Bedrooms);`

We can create box plots of the price for subgroups of sales defined by number of bedrooms.

`BoxPlot( map( (m)->m[Price], ByRooms),              deciles=false,              datasetlabels=["2 bdrms", "3 bdrms", "4 bdrms"],              color=["Red", "Purple", "Blue"]);`

I have recorded a short video that walks through this example here: https://youtu.be/e0pqCMyO3ks

## How to read a JSON file?...

Dear all,

I am looking at the help page for reading json files, but it is too advanced for me to understand...

Can anyone give an easy example where a json file is read into a variable and printing the data structure?

Hugs,
Louise =)

 1 2 3 4 5 6 7 Last Page 1 of 9
﻿