Option to ensure monotonic timestamps

From: Brent Kerby <blkerby(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Option to ensure monotonic timestamps
Date: 2018-02-19 20:42:31
Message-ID: CAH8WVsixOeUOb=Xdfy6+kq0w0zkHLSnNZ1100tj6xRhAPaz+Bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I'm new to Postgres hacking, and I'm interested in the possibility of a
new feature to make it possible to ensure that Postgres-generated
timestamps never decrease even if the system clock may step backwards. My
use case is that I'm implementing a form of temporal tables based on
transaction commit timestamps (as returned by pg_xact_commit_timestamp),
and to ensure the integrity of the system I need to know that the ordering
of the commit timestamps will always be consistent with the order in which
the transactions actually committed. I don't need the timestamps to be
unique; i.e., if transactions occur close together in time, then it's fine
for them to have the same timestamp -- just if the timestamps are different
then they must be in the right order. I would guess there may be other
scenarios where users may want to ensure the timestamps are monotonic, and
in general it would probably be desired for the monotonicity to apply
across all timestamps generated by a given Postgres server, not only the
commit timestamps.

I'm aware of the obvious alternative, which is simply to try to configure
the system clock so that it can't go backwards (e.g., using the option
"stepback 0" for ntpd). However, in virtual environments this could
potentially be difficult to achieve in a reliable way. And in any case,
since in my application the integrity of the data history hinges on the
timestamps being monotonic, I think it makes sense that this be enforceable
on the database level.

What I propose is that we could add a boolean configuration option, say
'ensure_monotonic_timestamps', that enables the following behavior: when
GetCurrentTimestamp is called (
https://doxygen.postgresql.org/backend_2utils_2adt_2timestamp_8c.html#a9822cdf3fd41b15851c0c18ddc80143c),
before it returns it checks if `result` is less than what was returned last
time (if any) that GetCurrentTimestamp was called, and if so it returns the
result from the previous call (after logging a warning), otherwise it
proceeds as normal. In its simplest form, this could be accomplished by
adding a global variable lastGetCurrentTimestamp that stores the result of
the previous call. Since GetCurrentTimestamp appears to be the source of
all of the significant system-generated timestamps, including commit
timestamps, this should produce the behavior I'm looking for.

One tricky thing is to figure out how to make this reliable even in the
situation where the database engine has to be restarted. When we're
starting up and have to initialize lastGetCurrentTimestamp, we need to make
sure to make sure we initialize it to be at least as large as the largest
previous result of GetCurrentTimestamp that made its way into the WAL
before shutdown, i.e., the largest previous result of GetCurrentTimestamp
that has the potential to be written out to tables upon recovery. What's
fuzzy to me is whether this would require writing new data to the WAL
specifically for this, or whether there are already timestamps (e.g., as
part of WAL metadata) that could serve this purpose.

Any thoughts?

- Brent Kerby

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-02-19 20:45:31 Re: unique indexes on partitioned tables
Previous Message Alvaro Herrera 2018-02-19 20:40:27 pgsql: Allow UNIQUE indexes on partitioned tables