Items tagged with statistics statistics Tagged Items Feed

This is the second of three blog posts about working with data sets in Maple.

In my previous post, I discussed how to use Maple to access a large number of data sets from Quandl, an online data aggregator. In this post, I’ll focus on exploring built-in data sets in Maple.

Data is being generated at an ever increasing rate. New data is generated every minute, adding to an expanding network of online information. Navigating through this information can be daunting. Simply preparing a tabular data set that collects information from several sources is often a difficult and time consuming effort. For example, even though the example in my previous post only required a couple of lines of Maple code to merge 540 different data sets from various sources, the effort to manually search for and select sources for data took significantly more time.

In an attempt to make the process of finding data easier, Maple’s built-in country data set collects information on country-specific variables including financial and economic data, as well as information on country codes, population, area, and more.

The built-in database for Country data can be accessed programmatically by creating a new DataSets Reference:

CountryData := DataSets:-Reference( "builtin", "country" );

This returns a Reference object, which can be further interrogated. There are several commands that are applicable to a DataSets Reference, including the following exports for the Reference object:

exports( CountryData, static );

The list of available countries in this data set is given using the following:

GetElementNames( CountryData );

The available data for each of these countries can be found using:

GetHeaders( CountryData );

There are many different data sets available for country data, 126 different variables to be exact. Similar to Maple’s DataFrame, the columns of information in the built-in data set can be accessed used the labelled name.

For example, the three-letter country codes for each country can be returned using:

CountryData[.., "3 Letter Country Code"];

The three-letter country code for Denmark is:

CountryData["Denmark", "3 Letter Country Code"];

Built-in data can also be queried in a similar manner to DataFrames. For example, to return the countries with a population density less than 3%:

pop_density := CountryData[ .., "Population Density" ]:
pop_density[ `Population Density` < 3 ];

At this time, Maple’s built-in country data collection contains 126 data sets for 185 countries. When I built the example from my first post, I knew exactly the data sets that I wanted to use and I built a script to collect these into a larger data container. Attempting a similar task using Maple’s built-in data left me with the difficult decision of choosing which data sets to use in my next example.

So rather than choose between these available options, I built a user interface that lets you quickly browse through all of Maple’s collection of built-in data.

Using a couple of tricks that I found in the pages for Programmatic Content Generation, I built the interface pictured above. (I’ll give more details on the method that I used to construct the interface in my next post.)

This interface allows you to select from a list of countries, and visualize up to three variables of the country data with a BubblePlot. Using the preassigned defaults, you can select several countries and then visualize how their overall number of internet users has changed along with their gross domestic product. The BubblePlot visualization also adds a third dimension of information by adjusting the bubble size according to the relative population compared with the other selected countries.

Now you may notice that the list of available data sets is longer than the list of available options in each of the selection boxes. In order to be able to generate BubblePlot animations, I made an arbitrary choice to filter out any of the built-in data sets that were not of type TimeSeries. This is something that could easily be changed in the code. The choice of a BubblePlot could also be updated to be any other type of Statistical visualization with some additional modifications.

You can download a copy of this application here: VisualizingCountryDataSets.mw

You can also interact with it via the MapleCloud: http://maplecloud.maplesoft.com/application.jsp?appId=5743882790764544

I’ll be following up this post with an in-depth post on how I authored the country selector interface using programmatic content generation.

Below is a custom distribution created based on a function that takes a parameter.

It is possible to create the custom distribution e.g. as D1 and then use it afterwards to find e.g. Mean, but it is not possible to call Mean directly with the creation of the distribution in the call.

Why is that ?

When defining a plain standard distributed stochastic variable X, and can find the probability of X <= 0.6 using the Probability function, but how can I get the value for a certain probability, as is done with the fsolve function for example below.

However, the fsolve used to defined Prev above appears to be a bad way to do it, since the Prev function can't for example plot.

Is there some build in way of doing reverse of Probability for a stocastical variable ?

Hi All

Assume that we have a stochastic model with following density function

and our goal is to estimate unknown parameters namely, alpha, beta, landa, mu and sigma by any available method especially maximum likelihood estimation method.
How can we do it with maple software?

Does the "MaximumLikelihoodEstimate" command can help?

or should i define Maximum Likelihood function first and then differentiate it according to unknown parameters?

 

thanks in advance

 

Mahmood   Dadkhah

Ph.D Candidate

Applied Mathematics Department

This is the first of three blog posts about working with data sets in Maple.

In 2013, I wrote a library for Maple that used the HTTP package to access the Quandl data API and import data sets into Maple. I was motivated by the fact that, when I was downloading data, I often used multiple data sources, manually updated data when updates were available, and cleaned or manipulated the data into a standardized form (which left me spending too much time on the data acquisition step).

Simply put, I needed a source for data that would provide me with a searchable, stable data API, which would also return data in a form that did not require too much post-processing.

My initial library had really just scratched the surface of what was possible.

Maple 2015 introduced the new DataSets package, which fully integrated a data set search into core library routines and made its functionality more discoverable through availability in Maple’s search bar.

Accessing online data suddenly became much easier. From within Maple, I could now search through over 12 million time series data sets provided by Quandl, and then automatically import the data into a format that I could readily work with.

If you’re not already aware of this online service, Quandl is an online data aggregator that delivers a wide variety of high quality financial and economic data. This includes the latest data on stocks and commodities, exchange rates, and macroeconomic indicators such as population, inflation, unemployment, and so on. Quandl collects both open and proprietary data sets from many sources, such as the US Federal Reserve System, OECD, Eurostat, The World Bank, and Open Data for Africa. Best of all, Quandl's powerful API is free to use.

One of the first examples for the DataSets package that I constructed was in part based on the inspirational work of Hans Rosling. I was drawn in by his ability to use statistical visualizations to break down complex multidimensional data sets and provide insight into underlying patterns; a key example investigating the correlation between rising incomes and life expectancy.

As well as online data, the DataSets package had a database for country data. Hence it seemed fitting to add an example that explored macroeconomic indicators for several countries. Accordingly, I set out to create an example that visualized variables such as Gross Domestic Product, Life Expectancy, and Population for a collection of countries.

I’ll now describe how I constructed this application.

The three key variables are Gross Domestic Product at Power Purchasing Parity, Life Expectancy, and Population. Having browsed through Quandl’s website for available data sets, the World Bank and Open Data for Africa projects seemingly had the most available relevant data; therefore I chose these as my data sources.

Pulling data for a single country from one of these sources was pretty straight forward. For example, the DataSets Reference for the Open Data for Africa data set on GDP at PPP for Canada is:

DataSets:-Reference("quandl", "ODA/CAN_PPPPC"));

In this command, the second argument is the Quandl data set code. If you are on Quandl’s website, this is listed near the top of the data set page as well as in the last few characters of the web address itself: https://www.quandl.com/data/ODA/CAN_PPPPC . Deconstructing the code, “ODA” stands for Open Data for Africa and the rest of the string is constructed from the three letter country code for Canada, “CAN”, and the code for the GDP and PPP. Looking at a small sample of other data set codes, I theorized that both of the data sources used a standardized data set name that included the ISO-3166 3-letter country code for available data sets. Based on this theory, I created a simple script to query for available data and discovered that there was data available for many countries using this standardized code. However, not every country had available data, so I needed to filter my list somewhat in order to pick only those countries for which information was available.

The script that I had constructed required three letter country codes. In order to test all available countries, I created a table to house the country names and three-letter country codes using data from the built-in database for countries:

ccdata := DataSets:-Builtin:-Reference("country")[.., "3 Letter Country Code"];
cctable := table([seq(op(GetElementNames(ccdata[i])) = ccdata[i, "3 Letter Country Code"], 
i = 1 .. CountRows(ccdata))]):

My script filtered this table, returning a subset of the original table, something like:

Countries := table( [“Canada” = “CAN”, “Sweden” = “SWE”, … ] );

You can see the filtered country list in the code edit region of the application below.

With this shorter list of countries, I was now ready to download some data. I created three vectors to hold the data sets by mapping in the DataSets Reference onto the “standardized” data set names that I pulled from Quandl. Here’s the first vector for the data on GDP at PPP.

V1 := Vector( [ (x) -> Reference("quandl", cat("ODA/", x, "_PPPPC"))
                   ~([entries(Countries, nolist, indexorder)])]):
#Open Data for Africa GDP at PPP

Having created three data vectors consisting of 180 x 3 = 540 data sets, I was finally ready to visualize the large set of data that I had amassed.

In Maple’s Statistics package, BubblePlots can use the horizontal axis, vertical axis and the relative bubble size to illustrate multidimensional information. Moreover, if incoming data is stored as a TimeSeries object, BubblePlots can generate animations over a common period of time.

Putting all of this together generated the following animation for 180 available countries.

This example will be included with the next version of Maple, but for now, you can download a copy here:DataSetsBubblePlot.mw

*Note: if you try this application at home, it will download 540 data sets. This operation plus the additional BubblePlot construction can take some time, so if you just want to see the finished product, you can simply interact with the animation in the Maple worksheet using the animation toolbar.

A more advanced example that uses multiple threads for data download can be seen at the bottom of the following page: https://www.maplesoft.com/products/maple/new_features/maple19/datasets_maple2015.pdf You can also interact with this example in Maple by searching for: ?updates,Maple2015,DataSets

In my next post, I’ll discuss how I used programmatic content generation to construct an interactive application for data retrieval.

>message:=`67A`;

67A

>P:=convert(message, bytes);

[54, 55, 65]

>with(Bits):

>bitP1:=Split(P1);

[0, 1, 1, 0, 1, 1]

>bitP2:=Split(P2);

[1, 1, 1, 0, 1, 1]

>bitP3:=Split(P3);

[1, 0, 0, 0, 0, 0, 1]

>with(Statistics):

>b1:=Count(bitP1);

6

>b2:=Count(bitP2);

6

>b3:=Count(bitP3);

7

>totalBits=b1+b2+b3;

19

 

Hi, how i need to modify my command so when i write any message with any lenght, i can get the totalBits directly..

Thank you~=]]

Is there a difference between these two? 

with(Statistics):

Sample(Normal(0,1),100)

Sample(RandomVariable(Normal(0, 1)), 100)

 

Does Maple have something similar to c-means clusteirng in Matlab?

http://www.mathworks.com/help/fuzzy/fcm.html

 How would I go about doing something like this in Maple?

Is there a way to plot critical values of the Pearson Correlation Coefficient r?  See attached worksheet.  Thanks!

Les    ect4_critical_value_of_r.mw

Howdy all,

I am trying to fit an exponential and logistical model to a set of population data i've been given using the NonlinearFit function in the statistics toolbox. When try to find the fit for the exponential function I get an error saying "SVD of estimated Jacobian could not be computed". Furthermore, when I display the regression over the set of data points all it shows is a horizontal line. I'm not sure how to go about fixing this. My data set is only 17 points and my input function is about as simple as it gets.

When I run the program to solve for logisitcal model I do not get the error but the displayed plot still shows just a horizontal line dispite the function being non-linear.

So far I have...

regE := NonlinearFit(a*exp(b*x),year,population,x)

regLog := NonlinearFit(a/(1+b*exp(-c*x)),year,population,x)

expon := plot((regE), x = 1850..2020):

logi := plot((regLog), x = 1850..2020):

display({data,logi,expon});

I have not tried using optimization yet but I will soon although I'm not sure if it will improve my results since my undertanding is that they both use the same process to estimate the parameters.

Anyways, Thanks for the help in advance!!

 

EDIT: Here is the data I am using.

year := [1850,1860,1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010]:

population :=[4668,9070,17375,27985,37249,63786,115693,186667,359328,528961,806701,1243158,1741912,2049527,2818199,3400578,4092459]:

 

 

Hi, may I know how I should write the commands after I convert my 'Hello' to 34 and 27 by using the commands below..

Hope someone can help me, thanks a lot..=]] Have a nice day~

 

message:=’Hello’;

>Hello

plaintext:=convert(message,bytes);

>[72, 101, 108, 108, 111]

P:=numtheory[cfrac](plaintext);

>9418838187/130799212

M1:=numer(P);

>9418838187

M2:=denom(P);

>130799212

with(Bits):

bitM1:=Split(M1);

>[1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1]

bitM2:=Split(M2);

>[0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1]

with(Statistics):

Count(bitM1);

>34

Count(bitM2);

>27

[72, 101, 108, 108, 111]

I have use a ''for cycle'' in order to get a series of points. I would like to save those points in a vector in order to use it for the ''PolynomialFit'' comand. The problem is that the points that I save are sort randomly. How can I take the value of the vector A in the right sequence? in the underline string you can plot the walue of A over t (which is not sorted). I can not use the sort command as I used for t even for A because the points are not increasing.

This is my code:

restart;

Atot := 0:

for ii from 0 by 0.01 to 2 do

PtotFkt := ii->  ii^2 :

Ptot := PtotFkt(ii):

Atot := Atot+0.01*Ptot:

A[ii] := Atot: #Save points in a Table

t[ii] := ii: #Save point in a table

end do;

AV := convert(A, list): #conversion from table to list
nops(AV);  #number of points

timme := convert(t, list): #conversion from table to list
nops(timme); #number of points

 

with(Statistics); #PolynomialFit

X := Vector(AV, datatype = float);

Y := Vector(sort(timme), datatype = float);

plot(Y, X, style = point, symbol = asterisk, color = blue);

regress := PolynomialFit(10, X, Y, time);

curve1 := plot(regress, time = 0 .. 2);

Hi friends! I have a problem with Random Variable. I don't understand why theoretical Mean differs from sample's Mean

restart; with(Statistics);

r := RandomVariable(NegativeBinomial(3, .1));
Mean((3-1)/(3+r-1));

0.1000000000

S := Sample(r, 10000);

d := map(unapply((3-1)/(3+t-1), t), S);

Mean(d);

0.04703520901756091

But !!

For example if p=0.2 then all is well

 

Hi,

 

  I think similar question has been asked by several people, but I did not find a suitable thread. My question is, suppose I have a probablity distirubtion function like

  p(x,y) = exp(-alpha (x+y) ) x^2 y^2 / |x-y|  , alpha>0

 x,y goes from - \infty to + \infty. This function is normalizable but unbounded, which makes the rejection algorithm a bit difficult(?).

 

  How to generate samping points from this type of probability distribution function?

 

Thank you very much!

 

Hi,

I'm null in statistic. I'm doing a calculation to calculate a set of parameter. By example:

I have to calculate 5 parameters x1,x2,x3,x4,x5 from 7 equations f1,f2,f3,f4,f5,f6,f7. Because it is difficult to calculate directly 5 parameters, I used chisquare to minimize the difference between experimental and theorical data. Then, I can get the results. After that, I used these 5 parameters to back-calculate the data using 7 equations above. My question is about how to calculate the error bar (or standard error) of the back-calculate datato add to the plot.

Thank you  for helping me,

Best regards.

 

1 2 3 4 5 6 7 Last Page 1 of 11