Scot Gould

Prof. Scot Gould

392 Reputation

12 Badges

8 years, 228 days
Claremont, California, United States
Dr. Scot Gould is a professor of physics in the W.M. Keck Science Department of Claremont McKenna, Pitzer, Scripps colleges - members of The Claremont Colleges in California. He was involved in the early development of the atomic force microscope. His research has included numerous studies and experiments making us of scanning probe microscopes, particularly those which involved natural fibers such as spider silk. More recently, he was involved in developing and sustaining AISS, a full-year multi-unit non-traditional interdisciplinary undergraduate science education course which integrated topics from biology, chemistry, physics, mathematics, and computer science. His current interest is integrating computational topics into the physics curriculum. He teaches the use of the computer algebraic and numerical system Maple to assist students in modeling and visualizing physical, and biological, systems. His Dirac-notation based quantum mechanics course is taught solely through Maple. An avid baseball fan, during his spare time, Dr. Gould is traveling, particularly to locations where he can bicycle on smooth, traffic-free roads, visit beaches and/or mountains, and enjoy good food and drink.

MaplePrimes Activity


These are Posts that have been published by Scot Gould

DataFrames: An example from the 2020 U.S. Presidential election

(Or why DataFrames are more powerful and readable than spreadsheets.)

 

In this example of working with DataFrames, the goal is to use a spreadsheet from a website, which contains polling data, to estimate the probability each of the two candidates from the major parties will win the US Presidential election in November.  I first tried doing the calculations with a spreadsheet, but I discovered DataFrames was far more powerful. Warning: This worksheet uses live data. Hence the outcome at the end of the worksheet is likely to change daily. A more extensive example with even more common DataFrame operations should be available soon.

 

How the US Presidential election works - highly simplified version: In the US there are only two parties for which their candidate could win the election:  the Democratic party and Republican party. The Republican party is often referred to as the "Grand Old Party", or GOP. Each state executes its own election. The candidate who receives the most votes wins the states "electoral votes" (EV). The number of the electoral votes for each state is essentially proportional to the population of the state. A candidate who receives a total of 270 or more EVs out of 538, is declared the president of the US for the next term, which starts January 20 of 2021.

 

Creating DataFrame from web based data:

First I download the data from the website. It is a CSV spreadsheet.

 

restart; interface(displayprecision = 3); interface(rtablesize = [6, 8]); web_data := Import("https://www.electoral-vote.com/evp2020/Pres/pres_polls.csv")

_m2211970420352

Each row contains information about a poll conducted in one of the states.  The first poll starts on row 2, hence the number of polls are:

Npolls := upperbound(web_data, 1)-1

572

Now I want to create a new DataFrame containing only the most useful information. In web_data, many are the columns are not important. However I do want to keep the column label names from those columns I wish to retain.

 

web_data_cols := [1, 3, 4, 5, 6]; column_labels := convert(web_data[1, web_data_cols], list)

["Day", "State", "EV", "Dem", "GOP"]

 

Because  the first poll in web_data is labeled 2, I would like to relabel all the polls starting from 1

row_labels := [seq(1 .. Npolls)]

 

Creating a DataFrame from a Matrix or another DataFrame:  (with row labels and column labels)

 

Now I can build the DataFrame that I will be working with:

 

poll_data := DataFrame(web_data[2 .. (), web_data_cols], 'columns' = column_labels, 'rows' = row_labels)

_m2211956910784

What each column means

* "Day" - day of the year in 2020 when the poll within the state was halfway completed. The larger the value, the more recent the poll.

* "State" - the state in the US where the poll was conducted. The candidate that receives the most votes "wins the state".

* "EV" - the number of electoral votes given to the candidate who receives the most votes within the state.

* "Dem" - the percentage of people who said they are going to vote for the candidate from the Democratic party.

* "GOP" - the percentage of people who said they are going to vote for the candidate from the Republican party.

Sorting:

By using the sort function, using the `>` operator, I can see which polls are the more recent. (If you run the worksheet yourself, the outcome will change as more polls are added to the website spreadsheet.)

poll_data := sort(poll_data, "Day", `>`)

_m2211960016288

 

Selecting Unique entries - by column values:

For the my simple analysis, I will use only the most recent poll, one from each state. Hence, using AreUnique, I can pull the first row that matches a state name. This new DataFrame called states.

 

states := poll_data[AreUnique(poll_data["State"])]

_m2211969565344

(Note, one of the "states" is the District of Columbia, D.C., which is why there are 51 rows.)

 

Removing a column: (and relabeling rows)

This next example isn't necessary, but shows some of the cool features of DataFrames.

 

Since there is only 1 entry per state, I'm going to remove the "State" column and relabel all the rows with the state names

state_names := convert(states["State"], list); states := DataFrame(Remove(states, "State"), 'rows' = state_names)

2

_m2211957755840

 

Indexing by row labels:


This allow me to to display information by individual states. What is the data for California, Maine and Alaska?

states[["California", "Maine", "Alaska"], () .. ()]

_m2211977321984

 

Mathematics with multiple-columns:

 

My preference is to work with fractions, rather than percentages. Hence I want all the values in the "Dem" and "GOP" to be divided by 100 (or multiplied by 1/100).  Treating each column like a vector, the multiplication is performed individually on each cell. This is what the tilda, "~", symbol performs.

states[["Dem", "GOP"]] := `~`[`*`](states[["Dem", "GOP"]], 1/100.); states

_m2211957755840

 

Mathematics: using a function to calculate a column

 

For the next action, I want to use the power of the Statistics package to create a "probability of winning the state" function.

 

For simplicity, I will assume the outcome of the voting in a state is purely random, but is conditional to popularity of each candidate as measured by the polls. I'll assume the likelihood of an outcome follows a normal (Gaussian) distribution with the peak being at point where the difference of the polling of the two candidates is zero. (Note, other than 2016, where there was an unusually larger percentage of undecided voters on election day, this simple model is reasonable accurate. For example, in 2012, of the states which appeared to be the "closest", the winner over-performed his polling in half of them, and under-performed in the other half with a mean difference of nearly zero.)  From previous elections, the standard deviation of differences between polling values and the actual outcome is at most 0.05, however, it does increase with the fraction of undecided voters.

 

To mathematically model this situation, I have chosen to use the "Cumulative Density Function" CDF in the Statistics package. It will calculate the probability that a candidate polling with fraction f1 wins the election if the other candidate is polling with fraction f2.  The variable u is the fraction of undecided voters. It is included in the calculation to increase the spread of the possible outcomes.

 

win_prob := Statistics:-CDF(Statistics:-RandomVariable(Normal(0., 0.5e-1+(1/4)*u)), f1-f2)

1/2+(1/2)*erf((1/2)*(f1-f2)*2^(1/2)/(0.5e-1+(1/4)*u))

 

Converting this expression into a function using the worst named function in Maple, unapply:

win_prob_f := unapply(evalf(win_prob), [f1, f2, u])

proc (f1, f2, u) options operator, arrow; .5000000000+.5000000000*erf(.7071067810*(f1-1.*f2)/(0.5e-1+.2500000000*u)) end proc

 

Now I can calculate a DataFrames column of the "win probability", in this case, for the candidate from the Democratic platy. By apply the function, individually, using the columns "Dem" and "GOP", I produce:

dem_win_prob := `~`[win_prob_f](states["Dem"], states["GOP"], `~`[`-`](1, `~`[`+`](states["Dem"], states["GOP"])))

_m2212010910496

Appending a column:

 

I can add this column to the end of the states with the label "DemWinProb":

 

states := Append(states, dem_win_prob, label = "DemWinProb")

_m2212009017568

 

Mathematics of adding the entries of a column:

 

How many electoral votes are available? add them up.

Total_EV := add(states["EV"])

538

 

While the number of EV a candidate wins is discrete, I can use the "win probability" from each state to estimate the total number of EV each of the candidates might win. This means adding up number of EV in each state times, individually, the probability of winning that state:

Dem_EV := round(add(`~`[`*`](states["EV"], states["DemWinProb"])))

354

Currently, the candidate from the Democratic party is likely to win more then 300 electoral vtes.

 

What about for the candidate from the Republican / "GOP" party?

gop_win_prob := `~`[win_prob_f](states["GOP"], states["Dem"], `~`[`-`](1, `~`[`+`](states["Dem"], states["GOP"]))); GOP_EV := round(add(`~`[`*`](states["EV"], gop_win_prob)))

184

Summing the two EV values, we obtain the total number of electoral votes.

Dem_EV+GOP_EV

538

  NULL

 

Download DataFrames_Example.mw

The purpose of this document is:

a) to correct the physics that was used in the document "Minimal Road Radius for Highway Superelevation" recently submitted to the Maple Applications Center;

b) to confirm the values found in the manual for the American Association of State Highway and Transportation Officials (AASHTO) that engineers use to design and build these banked curves are physically sound. 

c) to highlight the pedagogical value inherent in the Maple language to distinguish between assignment ( := )  and equivalence (  =  );

d) but most importantly, to demonstrate the pedagogical value Maple has in thinking about solving a problem involving a physical process. Given Maple's symbolic mathematics capabilities, one can implement a top-down approach to the physics and the mathematics, working from the general principle to the specific example. This allows one to avoid the types of errors that occur when translating the problem into a bottom up approach, from specific values of the example to the general principle, an approach that is required by most other computational systems.

I hope that others are willing to continue to engage in discussions related to the pedagogical value of Maple beyond mathematics.

I was asked to post this document to both here and the Maple Applications Center

[Document edited for typos.]

Minimum_Road_Radius.mw

I'm an educator (physicist) who has migrated to Maple because of the lower "activation barrier" to get something of interest produced by the student. The students in my courses are exposed to several language (Python, C++, Java) and mathematical systems (Mathematica, Maple, MATLAB.) Many claim that unless forced to used a particular language or system, their first choice is Python and Maple for the reason I cite. 

As a consequence, it is my experience that students truly perfer the math-like appearance of the 2-D Math notation as opposed to the Maple notation. They see it as more natural - again with a lower activation barrier. Hence I see no reason to change. However, I would be interested in reasons why it might be beneficial.

My ultimate question is: do I start them with worksheet mode or documents mode? I'm use to worksheet mode and have found the call and response method easy for them to understand. But document mode has many valuable benefits. Is it worth the increase in learning (and frustration) for the benefits if the students use the software only a few times per semester? Or for some, every week?

I would be interested in hearing about the experiences of other educators.

 

Page 1 of 1