wiki: blog/other-2038-issues



We are here to talk about the y2k38, the second installment of the end of time
for computers, with the first being the infamous y2k who was responsibly fixed
one chunk of code at a time by programmers all around the world, well before
the expected deadline.

Our new "2038 problem" lies in the fact that a lot of computers keep the time
internally as the number of seconds that has passed since 1970-01-01 00:00:00
UTC, storing that integer in a nice 32bit signed integer which gives support
for the immensely wide range of:
  * 1901-12-13 20:45:52 UTC to
  * 2038-01-19 03:14:07 UTC.

More than enough if you were to design this method for time keeping during
1969.  Yet with the year 2038 getting closer and it's clear that somehow we
must change how to store these dates.

There were several proposals:

  1. Let's use a 64bits integer instead!
     * this is one of the reasonable solutions, with the new epochalypse
       now becoming 292277026596-12-04 15:30:08 UTC.  However, it feels like
       we are wasting space by allowing a so far away date.

  2. Let's use a 64bits integer but instead let's save the number of
     milliseconds from the epoch!
     * And now we have a much closer 292278994-08-17 07:12:55.807 UTC, but
       still several times beyond the estimated date when the Sun will eat
       our planet.
     * Even when storing microseconds we still get a reasonable limit:
       294247-01-10 04:00:54 UTC
     * Or when storing the number of nanoseconds: 2262-04-11 23:47:16 UTC,
       which once again is more than enough if we consider the speed at which
       us humans are causing global warming.

Then it becomes obvious to consider using a standard 64bits ieee-754 floating
point number! that way we would not need to care about whether to save the
number of s, ms, µs or ns, we could just keep the number of seconds in the
integral part and use the remaining bits for the fraction of seconds.



How do ieee754 floating point numbers (from now on just floats) work?

They encode rational numbers using scientific notation in binary (base 2).
To do this we can use the following algorithm:

function convert_to_floating_point(n, fractional_bits)
  local buffer = {}
  exponent = 0
  -- Handle zero, nan and infinity, negative numbers:
  if n == 0 then return 'zero' end
  if type(n) ~= 'number' or n ~= n then return 'nan' end
  if n < 0 then table.insert(buffer, '-'); n=-n end
  if n == math.huge then return table.concat(buffer) .. 'inf' end

  -- Now let's move the comma to the left on numbers larger than the base (2):
  while n >= 2 do
    n = n/2
    exponent = exponent + 1
  end

  -- And now let's do the opposite for numbers smaller than 1:
  while n < 1 do
    n = n*2
    exponent = exponent - 1
  end

  -- Now our number is in the range 1 <= n < 2, and we have the exponent ready,
  -- so we know that it starts with '1.'
  table.insert(buffer, '1.')
  n = (n - 1) * 2

  -- so the next phase is taking fractional_bits out of it:
  for i = 1, fractional_bits do
    if n >= 1 then
      table.insert(buffer, '1')
      n = (n - 1) * 2
    else
      table.insert(buffer, '0')
      n = n * 2
    end
  end
  return table.concat(buffer) .. ' * 2^' .. exponent
end


And so, let's test convert 9.25: 1.001010000 * 2^3

Which makes sense, as 9.25 = 9 + 1/4; in binary: 1001 + 1/100 = 1001.01

Note that since scientific binary numbers always begin with "1." (except for
the zero), ieee-754 floats ignore this "1." when storing them as it becomes
redundant.  What they do contain is the following:

  * The sign bit (0 for positive, 1 for negative)
  * Exponent bits:
      We don't need to know how these bits are used, but in case you are
      curious:
      We first add a constant to the exponent called "exponent bias", this bias
      depends on the size of this field:
      on 32bit floats the exponent has only 8 bits and the bias is 127, while
      on 64bit floats the exponent has 11 bits with 1023 as the bias.
      Then we clamp its value to the maximum and minimum representable
      exponents: if it ends up at zero we shift the mantissa to the right by
      how much we went below zero to represent subnormal numbers, and if it
      went beyond the max value we store all ones in the mantissa as well to
      represent ±infinity.
  * Mantissa bits:
      As we have mentioned we only keep the fractional part, padding with
      zeros at the end, that's it.

Since in 64bit floats we have 52 bits to store the fractional part, this means
that any bits to the right of those 52 bits will be lost.

Moreover particular the precision of the floating point number will depend on
the size of the integer part!, that would be the number of seconds from epoch,
and so, these are the date ranges for which we would expect a given precision:

1970-01-01 00:00:02 .. 1970-01-01 00:00:03 -> 4.44089e-16
1970-01-01 00:00:04 .. 1970-01-01 00:00:07 -> 8.88178e-16
1970-01-01 00:00:08 .. 1970-01-01 00:00:15 -> 1.77636e-15
1970-01-01 00:00:16 .. 1970-01-01 00:00:31 -> 3.55271e-15
1970-01-01 00:00:32 .. 1970-01-01 00:01:03 -> 7.10543e-15
1970-01-01 00:01:04 .. 1970-01-01 00:02:07 -> 1.42109e-14
1970-01-01 00:02:08 .. 1970-01-01 00:04:15 -> 2.84217e-14
1970-01-01 00:04:16 .. 1970-01-01 00:08:31 -> 5.68434e-14
1970-01-01 00:08:32 .. 1970-01-01 00:17:03 -> 1.13687e-13
1970-01-01 00:17:04 .. 1970-01-01 00:34:07 -> 2.27374e-13
1970-01-01 00:34:08 .. 1970-01-01 01:08:15 -> 4.54747e-13
1970-01-01 01:08:16 .. 1970-01-01 02:16:31 -> 9.09495e-13
1970-01-01 02:16:32 .. 1970-01-01 04:33:03 -> 1.81899e-12
1970-01-01 04:33:04 .. 1970-01-01 09:06:07 -> 3.63798e-12
1970-01-01 09:06:08 .. 1970-01-01 18:12:15 -> 7.27596e-12
1970-01-01 18:12:16 .. 1970-01-02 12:24:31 -> 1.45519e-11
1970-01-02 12:24:32 .. 1970-01-04 00:49:03 -> 2.91038e-11
1970-01-04 00:49:04 .. 1970-01-07 01:38:07 -> 5.82077e-11
1970-01-07 01:38:08 .. 1970-01-13 03:16:15 -> 1.16415e-10
1970-01-13 03:16:16 .. 1970-01-25 06:32:31 -> 2.32831e-10
1970-01-25 06:32:32 .. 1970-02-18 13:05:03 -> 4.65661e-10
1970-02-18 13:05:04 .. 1970-04-08 02:10:07 -> 9.31323e-10
1970-04-08 02:10:08 .. 1970-07-14 04:20:15 -> 1.86265e-09
1970-07-14 04:20:16 .. 1971-01-24 08:40:31 -> 3.72529e-09
1971-01-24 08:40:32 .. 1972-02-16 17:21:03 -> 7.45058e-09
1972-02-16 17:21:04 .. 1974-04-03 10:42:07 -> 1.49012e-08
1974-04-03 10:42:08 .. 1978-07-04 21:24:15 -> 2.98023e-08
1978-07-04 21:24:16 .. 1987-01-05 18:48:31 -> 5.96046e-08
1987-01-05 18:48:32 .. 2004-01-10 13:37:03 -> 1.19209e-07
2004-01-10 13:37:04 .. 2038-01-19 03:14:07 -> 2.38419e-07
2038-01-19 03:14:08 .. 2106-02-07 06:28:15 -> 4.76837e-07
2106-02-07 06:28:16 .. 2242-03-16 12:56:31 -> 9.53674e-07
2242-03-16 12:56:32 .. 2514-05-30 01:53:03 -> 1.90735e-06
2514-05-30 01:53:04 .. 3058-10-26 03:46:07 -> 3.8147e-06
3058-10-26 03:46:08 .. 4147-08-20 07:32:15 -> 7.62939e-06
4147-08-20 07:32:16 .. 6325-04-08 15:04:31 -> 1.52588e-05
6325-04-08 15:04:32 .. 10680-07-14 06:09:03 -> 3.05176e-05
10680-07-14 06:09:04 .. 19391-01-25 12:18:07 -> 6.10352e-05
19391-01-25 12:18:08 .. 36812-02-20 00:36:15 -> 0.00012207
36812-02-20 00:36:16 .. 71654-04-10 01:12:31 -> 0.000244141
71654-04-10 01:12:32 .. 141338-07-19 02:25:03 -> 0.000488281
141338-07-19 02:25:04 .. 280707-02-04 04:50:07 -> 0.000976563
280707-02-04 04:50:08 .. 559444-03-08 09:40:15 -> 0.00195313
559444-03-08 09:40:16 .. 1116918-05-14 19:20:31 -> 0.00390625
1116918-05-14 19:20:32 .. 2231866-09-25 14:41:03 -> 0.0078125
2231866-09-25 14:41:04 .. 4461763-06-20 05:22:07 -> 0.015625
4461763-06-20 05:22:08 .. 8921556-12-07 10:44:15 -> 0.03125
8921556-12-07 10:44:16 .. 17841143-11-13 21:28:31 -> 0.0625
17841143-11-13 21:28:32 .. 35680317-09-25 18:57:03 -> 0.125
35680317-09-25 18:57:04 .. 71358665-06-19 13:54:07 ->  0.25
71358665-06-19 13:54:08 .. 142715360-12-06 03:48:15 ->   0.5
142715360-12-06 03:48:16 .. 285428751-11-12 07:36:31 ->   1

In other words, you will be/were able to have:
  nanoseconds precision until 1970-04-08
  microseconds precision until 2242-03-16
  milliseconds precision until 280707-02-04
  and seconds precision until 285428751-11-12



If you want to store the time in a 64bits floating point number, remember:

  * You can relatively safely use it to store microseconds with perfect
    precision until 2242-03-16, but it will start skipping microseconds
    from that date (probably not an issue right now).

  * It's totally safe to store times with milliseconds precision.
    I don't think that any of the hardware that we currently have or run
    will be able to reach the year 280707.

Postscript:

  If you want to know who uses this format, take a look at the following links:

  https://w3.impa.br/~diego/software/luasocket/socket.html#gettime
  https://docs.python.org/3/library/time.html#time.time
  https://docs.ruby-lang.org/en/master/Time.html#method-i-to_f
  https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT
    * try: select pg_typeof(extract(epoch from now())); -- if you don't believe

Suggested material for reading on the topic:
  https://www.leebutterman.com/2021/02/01/store-your-unix-epoch-times-as-float64.html
  https://en.wikipedia.org/wiki/IEEE_754
  https://en.wikipedia.org/wiki/Time_formatting_and_storage_bugs
  https://en.wikipedia.org/wiki/Year_2038_problem
  https://ocw.mit.edu/courses/6-00-introduction-to-computer-science-and-programming-fall-2008/resources/lecture-5/
    * this is an introduction to floating point numbers in python.  I add this
      link because the other videos in the course are also interesting for
      anyone starting with programming, even though it's a bit dated by now.



                                                                              π