Re: Global snapshots

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2018-05-09 14:51:25
Message-ID: CA+TgmoZHQtCAd+QNg0DXKS0RCqALz9Cc7mXkp7+r45j-U2XM7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 8, 2018 at 4:51 PM, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> wrote:
>> On 7 May 2018, at 20:04, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> But what happens if a transaction starts on node A at time T0 but
>> first touches node B at a much later time T1, such that T1 - T0 >
>> global_snapshot_defer_time?
>
> Such transaction will get "global snapshot too old" error.

Ouch. That's not so bad at READ COMMITTED, but at higher isolation
levels failure becomes extremely likely. Any multi-statement
transaction that lasts longer than global_snapshot_defer_time is
pretty much doomed.

> In principle such behaviour can be avoided by calculating oldest
> global csn among all cluster nodes and oldest xmin on particular
> node will be held only when there is some open old transaction on
> other node. It's easy to do from global snapshot point of view,
> but it's not obvious how to integrate that into postgres_fdw. Probably
> that will require bi-derectional connection between postgres_fdw nodes
> (also distributed deadlock detection will be easy with such connection).

I don't think holding back xmin is a very good strategy. Maybe it
won't be so bad if and when we get zheap, since only the undo log will
bloat rather than the table. But as it stands, holding back xmin
means everything bloats and you have to CLUSTER or VACUUM FULL the
table in order to fix it.

If the behavior were really analogous to our existing "snapshot too
old" feature, it wouldn't be so bad. Old snapshots continue to work
without error so long as they only read unmodified data, and only
error out if they hit modified pages. SERIALIZABLE works according to
a similar principle: it worries about data that is written by one
transaction and read by another, but if there's a portion of the data
that is only read and not written, or at least not written by any
transactions that were active around the same time, then it's fine.
While the details aren't really clear to me, I'm inclined to think
that any solution we adopt for global snapshots ought to leverage this
same principle in some way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-05-09 14:57:33 Re: Indexes on partitioned tables and foreign partitions
Previous Message Simon Riggs 2018-05-09 14:45:13 Re: Indexes on partitioned tables and foreign partitions