<

This is the first of three blog posts about working with data sets in Maple.

In 2013, I wrote a library for Maple that used the HTTP package to access the Quandl data API and import data sets into Maple. I was motivated by the fact that, when I was downloading data, I often used multiple data sources, manually updated data when updates were available, and cleaned or manipulated the data into a standardized form (which left me spending too much time on the data acquisition step).

Simply put, I needed a source for data that would provide me with a searchable, stable data API, which would also return data in a form that did not require too much post-processing.

My initial library had really just scratched the surface of what was possible.

Maple 2015 introduced the new DataSets package, which fully integrated a data set search into core library routines and made its functionality more discoverable through availability in Maple’s search bar.

Accessing online data suddenly became much easier. From within Maple, I could now search through over 12 million time series data sets provided by Quandl, and then automatically import the data into a format that I could readily work with.

If you’re not already aware of this online service, Quandl is an online data aggregator that delivers a wide variety of high quality financial and economic data. This includes the latest data on stocks and commodities, exchange rates, and macroeconomic indicators such as population, inflation, unemployment, and so on. Quandl collects both open and proprietary data sets from many sources, such as the US Federal Reserve System, OECD, Eurostat, The World Bank, and Open Data for Africa. Best of all, Quandl's powerful API is free to use.

One of the first examples for the DataSets package that I constructed was in part based on the inspirational work of Hans Rosling. I was drawn in by his ability to use statistical visualizations to break down complex multidimensional data sets and provide insight into underlying patterns; a key example investigating the correlation between rising incomes and life expectancy.

As well as online data, the DataSets package had a database for country data. Hence it seemed fitting to add an example that explored macroeconomic indicators for several countries. Accordingly, I set out to create an example that visualized variables such as Gross Domestic Product, Life Expectancy, and Population for a collection of countries.

I’ll now describe how I constructed this application.

The three key variables are Gross Domestic Product at Power Purchasing Parity, Life Expectancy, and Population. Having browsed through Quandl’s website for available data sets, the World Bank and Open Data for Africa projects seemingly had the most available relevant data; therefore I chose these as my data sources.

Pulling data for a single country from one of these sources was pretty straight forward. For example, the DataSets Reference for the Open Data for Africa data set on GDP at PPP for Canada is:

DataSets:-Reference("quandl", "ODA/CAN_PPPPC"));

In this command, the second argument is the Quandl data set code. If you are on Quandl’s website, this is listed near the top of the data set page as well as in the last few characters of the web address itself: https://www.quandl.com/data/ODA/CAN_PPPPC . Deconstructing the code, “ODA” stands for Open Data for Africa and the rest of the string is constructed from the three letter country code for Canada, “CAN”, and the code for the GDP and PPP. Looking at a small sample of other data set codes, I theorized that both of the data sources used a standardized data set name that included the ISO-3166 3-letter country code for available data sets. Based on this theory, I created a simple script to query for available data and discovered that there was data available for many countries using this standardized code. However, not every country had available data, so I needed to filter my list somewhat in order to pick only those countries for which information was available.

The script that I had constructed required three letter country codes. In order to test all available countries, I created a table to house the country names and three-letter country codes using data from the built-in database for countries:

ccdata := DataSets:-Builtin:-Reference("country")[.., "3 Letter Country Code"];
cctable := table([seq(op(GetElementNames(ccdata[i])) = ccdata[i, "3 Letter Country Code"], 
i = 1 .. CountRows(ccdata))]):

My script filtered this table, returning a subset of the original table, something like:

Countries := table( [“Canada” = “CAN”, “Sweden” = “SWE”, … ] );

You can see the filtered country list in the code edit region of the application below.

With this shorter list of countries, I was now ready to download some data. I created three vectors to hold the data sets by mapping in the DataSets Reference onto the “standardized” data set names that I pulled from Quandl. Here’s the first vector for the data on GDP at PPP.

V1 := Vector( [ (x) -> Reference("quandl", cat("ODA/", x, "_PPPPC"))
                   ~([entries(Countries, nolist, indexorder)])]):
#Open Data for Africa GDP at PPP

Having created three data vectors consisting of 180 x 3 = 540 data sets, I was finally ready to visualize the large set of data that I had amassed.

In Maple’s Statistics package, BubblePlots can use the horizontal axis, vertical axis and the relative bubble size to illustrate multidimensional information. Moreover, if incoming data is stored as a TimeSeries object, BubblePlots can generate animations over a common period of time.

Putting all of this together generated the following animation for 180 available countries.

This example will be included with the next version of Maple, but for now, you can download a copy here:DataSetsBubblePlot.mw

*Note: if you try this application at home, it will download 540 data sets. This operation plus the additional BubblePlot construction can take some time, so if you just want to see the finished product, you can simply interact with the animation in the Maple worksheet using the animation toolbar.

A more advanced example that uses multiple threads for data download can be seen at the bottom of the following page: https://www.maplesoft.com/products/maple/new_features/maple19/datasets_maple2015.pdf You can also interact with this example in Maple by searching for: ?updates,Maple2015,DataSets

In my next post, I’ll discuss how I used programmatic content generation to construct an interactive application for data retrieval.

Please Wait...