**Problem:**

Suppose you have a bunch of 2D data points which:

- May include points with the same
*x*-value but different *y*-values; and
- May be unevenly-spaced with respect to the
*x*-values.

How do you clean up the data so that, for instance, you are free to construct a connected data plot, or perform a Discrete Fourier Transform? Please note that Curve Fitting and the Lomb–Scargle Method, respectively, are more effective techniques for these particular applications. Let's start with a simple example for illustration. Consider this Matrix:

A := < 2, 5; 5, 8; 2, 1; 7, 8; 10, 10; 5, 7 >;

**Consolidate:**

First, sort the rows of the Matrix by the first column, and extract the sorted columns separately:

P := sort( A[..,1], output=permutation ); # permutation to sort rows by the values in the first column
U := A[P,1]; # sorted column 1
V := A[P,2]; # sorted column 2

We can regard the sorted bunches of distinct values in *U* as a step in a stair case, and the goal is replace each step with the average of the *y*-values in *V* located on each step.

Second, determine the indices for the first occurrences of values in *U*, by selecting the indices which give a jump in *x*-value:

m := numelems( U );
K := [ 1, op( select( i -> ( U[i] > U[i-1] ), [ seq( j, j=2..m ) ] ) ), m+1 ];
n := numelems( K );

The element *m+1* is appended for later convenience. Here, we can quickly define the first column of the consolidated Matrix:

X1 := U[K[1..-2]];

Finally, to define the second column of the consolidated Matrix, we take the average of the values in each step, using the indices in *K* to tell us the ranges of values to consider:

Y1 := Vector[column]( n-1, i -> add( V[ K[i]..K[i+1]-1 ] ) / ( K[i+1] - K[i] ) );

Thus, the consolidated Matrix is given by:

B := < X1 | Y1 >;

**Spread Evenly:**

To spread-out the *x*-values, we can use a sequence with fixed step size:

X2 := evalf( Vector[column]( [ seq( X1[1]..X1[-1], (X1[-1]-X1[1])/(m-1) ) ] ) );

For the *y*-values, we will interpolate:

Y2 := CurveFitting:-ArrayInterpolation( X1, Y1, X2, method=linear );

This gives us a new Matrix, which has both evenly-spaced *x*-values and consolidated *y*-values:

C := < X2 | Y2 >;

**Plot:**

plots:-display( Array( [
plots:-pointplot( A, view=[0..10,0..10], color=green, symbol=solidcircle, symbolsize=15, title="Original Data", font=[Verdana,15] ),
plots:-pointplot( B, view=[0..10,0..10], color=red, symbol=solidcircle, symbolsize=15, title="Consolidated Data", font=[Verdana,15] ),
plots:-pointplot( C, view=[0..10,0..10], color=blue, symbol=solidcircle, symbolsize=15, title="Spread-Out Data", font=[Verdana,15] )
] ) );

**Sample Data with Noise:**

For another example, let’s take data points from a logistic curve, and add some noise:

# Noise generators
f := 0.5 * rand( -1..1 ):
g := ( 100 - rand( -15..15 ) ) / 100:
# Actual x-values
X := [ seq( i/2, i=1..20 ) ];
# Actual y-values
Y := evalf( map( x -> 4 / ( 1 + 3 * exp(-x) ), X ) );
# Matrix of points with noise
A := Matrix( zip( (x,y) -> [x,y], map( x -> x + f(), X ), map( y -> g() * y, Y ) ) );

Using the method outlined above, and the general procedures defined below, define:

B := ConsolidatedMatrix( A );
C := EquallySpaced( B, 21, method=linear );

Visually:

plots:-display( Array( [
plots:-pointplot( A, view=[0..10,0..5], symbol=solidcircle, symbolsize=15, color=green, title="Original Data", font=[Verdana,15] ),
plots:-pointplot( B, view=[0..10,0..5], symbol=solidcircle, symbolsize=15, color=red, title="Consolidated Data", font=[Verdana,15] ),
plots:-pointplot( C, view=[0..10,0..5], symbol=solidcircle, symbolsize=15, color=blue, title="Spread-Out Data", font=[Verdana,15] )
] ) );

**Generalization:**

Below are more generalized custom procedures, which are used in the above example. These also account for special cases.

# Takes a matrix with two columns, and returns a new matrix where the new x-values are unique and sorted,
# and each new y-value is the average of the old y-values corresponding to the x-value.
ConsolidatedMatrix := proc( A :: 'Matrix'(..,2), $ )
local i, j, K, m, n, P, r, U, V, X, Y:
# The number of rows in the original matrix.
r := LinearAlgebra:-RowDimension( A ):
# Return the original matrix should it only have one row.
if r = 1 then
return A:
end if:
# Permutation to sort first column of A.
P := sort( A[..,1], ':-output'=permutation ):
# Sorted first column of A.
U := A[P,1]:
# Corresponding new second column of A.
V := A[P,2]:
# Return the sorted matrix should all the x-values be distinct.
if numelems( convert( U, ':-set' ) ) = r then
return < U | V >:
end if:
# Indices of first occurrences for values in U. The element m+1 is appended for convenience.
m := numelems( U ):
K := [ 1, op( select( i -> ( U[i] > U[i-1] ), [ seq( j, j=2..m ) ] ) ), m+1 ]:
n := numelems( K ):
# Consolidated first column.
X := U[K[1..-2]]:
# Determine the consolidated second column, using the average y-value.
Y := Vector[':-column']( n-1, i -> add( V[ K[i]..K[i+1]-1 ] ) / ( K[i+1] - K[i] ) ):
return < X | Y >:
end proc:
# Procedure which takes a matrix with two columns, and returns a new matrix of specified number of rows
# with equally-spaced x-values, and interpolated y-values.
# It accepts options that can be passed to ArrayInterpolation().
EquallySpaced := proc( M :: 'Matrix'(..,2), m :: posint )
local A, i, r, U, V, X, Y:
# Consolidated matrix, the corresponding number of rows, and the columns.
A := ConsolidatedMatrix( M ):
r := LinearAlgebra:-RowDimension( A ):
U, V := evalf( A[..,1] ), evalf( A[..,2] ):
# If the consolidated matrix has only one row, return it.
if r = 1 then
return A:
end if:
# If m = 1, i.e. only one equally-spaced point is requested, then return a matrix of the averages.
if m = 1 then
return 1/r * Matrix( [ [ add( U ), add( V ) ] ] ):
end if:
# Equally-spaced x-values.
X := Vector[':-column']( [ seq( U[1]..U[-1], (U[-1]-U[1])/(m-1), i=1..m ) ] ):
# Interpolated y-values.
Y := CurveFitting:-ArrayInterpolation( U, V, X, _rest ):
return < X | Y >:
end proc:

**Worth Checking Out:**