- Voting patterns in Mexico and Florida.
- The size of files in your Maple 12 installation
- Stock trading volumes on the NYSE
What do all of these have in common? They, and other data sets drawn from the real world, often follow a non-intuitive pattern called Benford’s Law.
Benford’s Law states that the first non-zero digit of each item in the data set is "1" 30% of the time, "2" 18% of the time, with higher leading digits occurring with lower frequencies. Specifically, it predicts that a leading digit m occurs with a probability of . Many real-life lists of data are logarithmically distributed and often follow Benford’s Law, including
- populations of cities,
- GDP of all the countries in the world
- sales forecasts and enumerated business plans,
- tables of physical constants,
- winning bids for specific types of Ebay auctions,
- and the size of files on your hard disk.
The following plot, for example, describes the leading digit frequency of the sizes of all files within my Maple 12 installation (which includes several ebooks and toolboxes), together with the expected distribution from Benford’s Law.
This plot gives the frequency of leading digits of the population of 50 US states.
Benford’s Law has a number of applications.
For Benford’s Law to apply, data sets have to cover several orders of magnitude, with no built-in biases or limits (i.e. companies often allow e.g. $50 in expenses without requiring a receipt, hence employees tend not to submit expenses above that amount).
Once you’ve found a suitable data set, using Maple to calculate the frequency of leading digits is simple. Assuming that the data arrives in an Array called data, this snippet of code returns a Vector nums with the frequency of the leading digits.
I’ve attached a worksheet that lets you investigate Benford’s Law in the stock market. It lets you download the Historical Open, High, Low, and Close prices, and the Trading Volume for a specific NYSE stock ticker from Yahoo. The frequency of the leading digit for any of these items can then be plotted against the expected distribution from Benford’s Law. From my brief investigation, trading volumes tend to follow Benford’s Law (at some level) better than any other quantities. Anything dramatically different may demand greater attention...