23/05/2021

17. Benford law

Benford's law (binary numbering):

for any number greater than zero,

"1" appears as the first digit in

 100% of cases.

The problem was this: consider the first digit in the decimal expansion of 2n for n ≥ 0: 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, ...

Does the digit 7 appear in this sequence?

Does 7 or 8 appear more frequently? 

How much more frequently?

At the time I did not know which the correct way was to solve the problem, but I approached it like this: on a logarithmic scale the products are transformed into sums, for example multiplying by 2 is equivalent to adding log (2) and we obtain in sequence: 1, 2, 4, 8, 16, 32, 64, 128, ...

The first digits of the sequence are: 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 7, 1, 2, 5, 1, 2, 4, 9, 1, 3, 7, 1, 2, 5, 1, 2, 4, 9, 1, 3, 7, 1, 2, 5, 1, 2, 4, 9, 1, 3, 7, 1, 3, 6, 1, 2, 4, 9, 1, 3, 7, 1, 3, 6, 1, 2, 4, 9, 1, 3, ...

Sloane's On-Line Encyclopedia of Integer Sequences A008952

By definition, position 1 is set to log10 1 = 0, while position 10 to 1 (log10 10 = 1) and 100 to 2 (log10 100 = 2), the intermediate positions are at:

log10 2 = 0.3010, log10 3 = 0.4771, log10 4 = 0.6020,…, log10 9 = 0.9542

Each interval I (m) of numbers starting with the digit m is between log (m) and log (m + 1), so the various intervals hold:

log (m + 1) - log (m) = log ((m + 1) / m) = log (1 + 1/m)

Returning to the problem, the digit 7 appears log (1 + 1/7) = 0.05799 = 5.8% and to see it appear we have to wait 246 = 70368744177664.

The second question can be answered that it appears more 7, and

for n > 209, the frequency f (7) of the digit 7 is greater than that of 8:

f (7) / f (8) tends to log10 (1 + 1/7) / log10 (1 + 1/8) = 1.133706496

Note: I(1) = I(2) + I(3) = I(4) + I(5) + I(6) + I(7)

Benford's law arises from the observation that in many collections of numbers, eg. mathematical tables, real-life data, or combinations thereof in various units of measurement, the initial significant digits are not evenly distributed, as you might expect, but are larger for the smaller digits.

It claims that the significant figures in many data sets follow a logarithmic distribution. In its most common formulation, it is also known as the "law of the first digit" and is named after the American physicist Frank Benford (1883-1948) who formulated it in 1938, although it had already been highlighted by the American astronomer Simon Newcomb (1835 -1909) in 1881.

Benford observed that the logarithmic tables, used at the time for calculations, had very crumpled first pages and therefore deduced that the numbers beginning with 1 occurred more often than those beginning for the other digits.

This distribution has a characteristic property known as "scale invariance" and is often used to discover "forged" data.

If you were to falsify numbers, make sure that the number 1 appears in about 30% of the cases, 2 in 17% and so on.

For a given number of digits, it is possible to calculate the probability of encountering a number starting with the digit string n of that length. Below what is reported in Wikipedia under "Benford's law".



The distribution of the n-th digit, as n increases, quickly approaches a uniform distribution with 10% for each of the ten digits, as shown below:







Benford Online Bibliography

https://mathworld.wolfram.com/BenfordsLaw.html

Benford's Law (mathpages.com)

Index to OEIS: Section Be - OeisWiki

https://oeis.org/A008952