:

## Aggregate Statistics on DataFrames

Maple 2016

Aggregate statistics are calculated by splitting the rows of a DataFrame by each factor in a given column into subsets and computing summary statistics for each of these subsets.

The following is a short example of how the Aggregate command is used to compute aggregate statistics for a DataFrame with housing data:

To begin, we construct a DataFrame with housing data: The first column has number of bedrooms, the second has the area in square feet, the third has price.

bedrooms := <3, 4, 2, 4, 3, 2, 2, 3, 4, 4, 2, 4, 4, 3, 3>:
area := <1130, 1123, 1049, 1527, 907, 580, 878, 1075,
1040, 1295, 1100, 995, 908, 853, 856>:
price := <114700, 125200, 81600, 127400, 88500, 59500, 96500, 113300,
104400, 136600, 80100, 128000, 115700, 94700, 89400>:
HouseSalesData := DataFrame([bedrooms, area, price], columns = [Bedrooms, Area, Price]);

Note that the Bedrooms column has three distinct levels: 2, 3, and 4.

convert(HouseSalesData[Bedrooms], set);

The following returns the mean of all other columns for each distinct level in the column, Bedrooms:

Aggregate(HouseSalesData, Bedrooms);

Adding the columns option controls which columns are returned.

Aggregate(HouseSalesData, Bedrooms, columns = [Price])

Additionally, the tally option returns a tally for each of the levels.

Aggregate(HouseSalesData, Bedrooms, tally)

The function option allows for the specification of any command that can be applied to a DataSeries. For example, the Statistics:-Median command computes the median for each of the levels of Bedrooms.

Aggregate(HouseSalesData, Bedrooms, function = Statistics:-Median);

By default, Aggregate uses the SplitByColumn command to creates a separate sub-DataFrame for every discrete level in the column given by bycolumn.

with(Statistics);
ByRooms := SplitByColumn(HouseSalesData, Bedrooms);

We can create box plots of the price for subgroups of sales defined by number of bedrooms.

BoxPlot( map( (m)->m[Price], ByRooms),
deciles=false,
datasetlabels=["2 bdrms", "3 bdrms", "4 bdrms"],
color=["Red", "Purple", "Blue"]);

I have recorded a short video that walks through this example here: