Benford's law (binary
numbering):
for any number greater
than zero,
"1"
appears as the first digit in
100% of cases.
The
problem was this: consider the first digit in the decimal expansion of 2n
for n ≥ 0: 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, ...
Does the digit 7 appear in this sequence?
Does 7 or 8 appear more frequently?
How much more
frequently?
At the time I did not know which the correct way was to solve the problem, but I approached it like this: on a logarithmic scale the products are transformed into sums, for example multiplying by 2 is equivalent to adding log (2) and we obtain in sequence: 1, 2, 4, 8, 16, 32, 64, 128, ...
The first digits of the sequence are: 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 6, 1, 2, 5, 1, 2, 4, 8, 1, 3, 7, 1, 2, 5, 1, 2, 4, 9, 1, 3, 7, 1, 2, 5, 1, 2, 4, 9, 1, 3, 7, 1, 2, 5, 1, 2, 4, 9, 1, 3, 7, 1, 3, 6, 1, 2, 4, 9, 1, 3, 7, 1, 3, 6, 1, 2, 4, 9, 1, 3, ...
Sloane's On-Line Encyclopedia of Integer Sequences A008952
By definition, position 1 is set to log10 1 = 0, while position 10 to 1 (log10 10 = 1) and 100 to 2 (log10 100 = 2), the intermediate positions are at:
log10 2 = 0.3010, log10 3 = 0.4771, log10 4 = 0.6020,…, log10 9 = 0.9542
Each interval I (m) of numbers starting with the digit m is between log (m) and log (m + 1), so the various intervals hold:
log
(m + 1) - log (m) = log ((m + 1) / m) = log (1 + 1/m)
Returning to the problem, the digit 7 appears log (1 + 1/7) = 0.05799 = 5.8% and to see it appear we have to wait 246 = 70368744177664.
The second question can be answered that it appears
more 7, and
for n > 209, the frequency f (7) of the digit 7 is greater than that of 8:
f (7) / f (8) tends to log10 (1 + 1/7) /
log10 (1 + 1/8) = 1.133706496
Note: I(1) = I(2) + I(3) = I(4) + I(5) + I(6) + I(7)
Benford's law arises from the observation that in many
collections of numbers, eg. mathematical tables, real-life data, or
combinations thereof in various units of measurement, the initial significant
digits are not evenly distributed, as you might expect, but are larger for the
smaller digits.
It claims that the significant figures in many data
sets follow a logarithmic distribution. In its most common formulation, it is
also known as the "law of the first digit" and is named after the
American physicist Frank Benford (1883-1948) who formulated it in 1938,
although it had already been highlighted by the American astronomer Simon
Newcomb (1835 -1909) in 1881.
Benford observed that the logarithmic tables, used at
the time for calculations, had very crumpled first pages and therefore deduced
that the numbers beginning with 1 occurred more often than those beginning for
the other digits.
This distribution has a characteristic property known
as "scale invariance" and is often used to discover
"forged" data.
If you were to falsify numbers, make sure that the
number 1 appears in about 30% of the cases, 2 in 17% and so on.
For a given number of digits, it is possible to
calculate the probability of encountering a number starting with the digit
string n of that length. Below what is reported in Wikipedia under
"Benford's law".
The distribution of the n-th digit, as n increases, quickly approaches a uniform distribution with 10% for each of the ten digits, as shown below:
https://mathworld.wolfram.com/BenfordsLaw.html