Aggregate statistics are calculated by splitting the rows of a DataFrame by each factor in a given column into subsets and computing summary statistics for each of these subsets.
The following is a short example of how the Aggregate command is used to compute aggregate statistics for a DataFrame with housing data:
To begin, we construct a DataFrame with housing data: The first column has number of bedrooms, the second has the area in square feet, the third has price.
bedrooms := <3, 4, 2, 4, 3, 2, 2, 3, 4, 4, 2, 4, 4, 3, 3>:
area := <1130, 1123, 1049, 1527, 907, 580, 878, 1075,
1040, 1295, 1100, 995, 908, 853, 856>:
price := <114700, 125200, 81600, 127400, 88500, 59500, 96500, 113300,
104400, 136600, 80100, 128000, 115700, 94700, 89400>:
HouseSalesData := DataFrame([bedrooms, area, price], columns = [Bedrooms, Area, Price]);
Note that the Bedrooms column has three distinct levels: 2, 3, and 4.
The following returns the mean of all other columns for each distinct level in the column, Bedrooms:
Adding the columns option controls which columns are returned.
Aggregate(HouseSalesData, Bedrooms, columns = [Price])
Additionally, the tally option returns a tally for each of the levels.
Aggregate(HouseSalesData, Bedrooms, tally)
The function option allows for the specification of any command that can be applied to a DataSeries. For example, the Statistics:-Median command computes the median for each of the levels of Bedrooms.
Aggregate(HouseSalesData, Bedrooms, function = Statistics:-Median);
By default, Aggregate uses the SplitByColumn command to creates a separate sub-DataFrame for every discrete level in the column given by bycolumn.
ByRooms := SplitByColumn(HouseSalesData, Bedrooms);
We can create box plots of the price for subgroups of sales defined by number of bedrooms.
BoxPlot( map( (m)->m[Price], ByRooms),
datasetlabels=["2 bdrms", "3 bdrms", "4 bdrms"],
color=["Red", "Purple", "Blue"]);
I have recorded a short video that walks through this example here:
The worksheet for this example can be downloaded here: Aggregate.mw