Benfords Law is not an exciting new John Nettles based detective show, but an interesting observation about the distribution of the first digit in sets of numbers originating from various processes. It says, roughly, that in a big collection of data you should expect to see a number starting with 1 about 30% of the time, but starting with 9 only about 5% of the time. Precisely, the proportion for a given digit can be worked out as:

<?php

function benford($num) {

return log10(1+1/$num);

}

Real data does tend to fit this pretty well. For example, just leaping onto data.gov.uk at random and grabbing a dataset - in this case a list of spending in the Science and Technology Facilities Council, I can compare the first digit to Benford's expected ones (I grabbed the Amount column out of the april 2010 data and put it into a text file, one amount per line):

<?php

$fh = fopen("data.txt", 'r');

$score = array();

$total = 0;

$nums = range(1, 9);

// Count up appearances of digits

while($data = fgets($fh)) {

$total++;

$digit = substr(trim($data), 0, 1);

if(!in_array($digit, $nums)) {

continue;

}

if(!isset($score[$digit])) {

$score[$digit] = 0;

}

$score[$digit]++;

}

arsort($score);

echo "# - Data - Benford", PHP_EOL;

foreach($score as $digit => $count) {

echo "$digit - ",

number_format($count/$total, 3),

" - ",

number_format(benford($digit), 3),

PHP_EOL;

}

We get a pretty clear match:

# - Data - Benford 1 - 0.273 - 0.301 2 - 0.181 - 0.176 3 - 0.114 - 0.125 4 - 0.107 - 0.097 5 - 0.088 - 0.079 6 - 0.070 - 0.067 7 - 0.055 - 0.058 8 - 0.050 - 0.051 9 - 0.047 - 0.046

This is fun, because if someone makes up a data set, it probably wont follow this distribution. This is used in accountancy to detect fraudulent entries. If there is a reporting limiting at £3000 within a certain company where fraud is going on, there will probably be more dodgy transactions at £2999, for example, which will throw off the stats. More advanced checking actually goes further into the digits rather than just considering the initial one. As always, there's plenty more on the law on Wikipedia.

doug bennionApril 2nd, 2011 at 00:35

Thanks, good example. You can find another few examples here www.benfords-law.com.

Stefan KarpinskiApril 2nd, 2011 at 17:25

By far the best technical discussion of Benford's Law and other related phenomena is on Terence Tao's blog:

http://terrytao.wordpress.com/2009/07/03/benfords-law-zipfs-law-and-the-pareto-distribution/

On the other hand, you also need to be careful when diagnosing a power law:

http://cscs.umich.edu/~crshalizi/weblog/491.html

Ian Barber’s Blog: Benford’s Law | Scripting4You BlogApril 6th, 2011 at 05:15

…ar addthis_config = {"data_track_clickback":true};

In a recent post to his blog Ian Barber looks at applying Benford’s Law in PHP to determine if the dataset you’re working with is “real” or not. Benfords Law is…

Ian Barber’s Blog: Benford’s LawApril 6th, 2011 at 05:21

In a recent post to his blog Ian Barber looks at applying Benford’s Law in PHP to determine if the dataset you’re working with is “real” or not. Benfords Law is…

Benford's Law - Programacion de JuegosMay 1st, 2011 at 20:24

… (gadgetize.co.za por ejemplo). ¿Resultado? ¡La ley se cumple perfectamente!

Aqui está el script tomado de aqui sin permiso: $fh = fopen("data.txt", 'r');

$score = array();

$total = 0;

$nums = range(1, 9);

// Co…

Mike BlakleyMay 23rd, 2011 at 03:40

There is free software available which simplifies and speeds up the calculations used in Benford's Law (Web CAAT). Software is open source (LGPL) and written entirely in PHP. Can be run on intranet or standalone. More info at http://ezrstats.com.