:

## Aggregate Statistics on DataFrames

Maple 2016

Aggregate statistics are calculated by splitting the rows of a DataFrame by each factor in a given column into subsets and computing summary statistics for each of these subsets.

The following is a short example of how the Aggregate command is used to compute aggregate statistics for a DataFrame with housing data:

To begin, we construct a DataFrame with housing data: The first column has number of bedrooms, the second has the area in square feet, the third has price.

```bedrooms := <3, 4, 2, 4, 3, 2, 2, 3, 4, 4, 2, 4, 4, 3, 3>:
area := <1130, 1123, 1049, 1527, 907, 580, 878, 1075,
1040, 1295, 1100, 995, 908, 853, 856>:
price := <114700, 125200, 81600, 127400, 88500, 59500, 96500, 113300,
104400, 136600, 80100, 128000, 115700, 94700, 89400>:
HouseSalesData := DataFrame([bedrooms, area, price], columns = [Bedrooms, Area, Price]);``` Note that the Bedrooms column has three distinct levels: 2, 3, and 4.

`convert(HouseSalesData[Bedrooms], set);` The following returns the mean of all other columns for each distinct level in the column, Bedrooms:

`Aggregate(HouseSalesData, Bedrooms);` Adding the columns option controls which columns are returned.

`Aggregate(HouseSalesData, Bedrooms, columns = [Price])` Additionally, the tally option returns a tally for each of the levels.

`Aggregate(HouseSalesData, Bedrooms, tally)` The function option allows for the specification of any command that can be applied to a DataSeries. For example, the Statistics:-Median command computes the median for each of the levels of Bedrooms.

`Aggregate(HouseSalesData, Bedrooms, function = Statistics:-Median);` By default, Aggregate uses the SplitByColumn command to creates a separate sub-DataFrame for every discrete level in the column given by bycolumn.

`with(Statistics);`
`ByRooms := SplitByColumn(HouseSalesData, Bedrooms);` We can create box plots of the price for subgroups of sales defined by number of bedrooms.

```BoxPlot( map( (m)->m[Price], ByRooms),
deciles=false,
datasetlabels=["2 bdrms", "3 bdrms", "4 bdrms"],
color=["Red", "Purple", "Blue"]);``` I have recorded a short video that walks through this example here: 