Re: Global snapshots

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Global snapshots
Date: 2018-09-24 08:58:48
Message-ID: 19F4097D-0525-4C4E-A40F-2C8C2B0787CF@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I want to review this patch set. Though I understand that it probably will be quite long process.

I like the idea that with this patch set universally all postgres instances are bound into single distributed DB, even if they never heard about each other before :) This is just amazing. Or do I get something wrong?

I've got few questions:
1. If we coordinate HA-clusters with replicas, can replicas participate if their part of transaction is read-only?
2. How does InDoubt transaction behave when we add or subtract leap seconds?

Also, I could not understand some notes from Arseny:

> 25 июля 2018 г., в 16:35, Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> написал(а):
>
> * One drawback of these patches is that only REPEATABLE READ is
> supported. For READ COMMITTED, we must export every new snapshot
> generated on coordinator to all nodes, which is fairly easy to
> do. SERIALIZABLE will definitely require chattering between nodes,
> but that's much less demanded isolevel (e.g. we still don't support
> it on replicas).

If all shards are executing transaction in SERIALIZABLE, what anomalies does it permit?

If you have transactions on server A and server B, there are transactions 1 and 2, transaction A1 is serialized before A2, but B1 is after B2, right?

Maybe we can somehow abort 1 or 2?

>
> * Another somewhat serious issue is that there is a risk of recency
> guarantee violation. If client starts transaction at node with
> lagging clocks, its snapshot might not include some recently
> committed transactions; if client works with different nodes, she
> might not even see her own changes. CockroachDB describes at [1] how
> they and Google Spanner overcome this problem. In short, both set
> hard limit on maximum allowed clock skew. Spanner uses atomic
> clocks, so this skew is small and they just wait it at the end of
> each transaction before acknowledging the client. In CockroachDB, if
> tuple is not visible but we are unsure whether it is truly invisible
> or it's just the skew (the difference between snapshot and tuple's
> csn is less than the skew), transaction is restarted with advanced
> snapshot. This process is not infinite because the upper border
> (initial snapshot + max skew) stays the same; this is correct as we
> just want to ensure that our xact sees all the committed ones before
> it started. We can implement the same thing.
I think that this situation is also covered in Clock-SI since transactions will not exit InDoubt state before we can see them. But I'm not sure, chances are that I get something wrong, I'll think more about it. I'd be happy to hear comments from Stas about this.
>
>
> * 003_bank_shared.pl test is removed. In current shape (loading one
> node) it is useless, and if we bombard both nodes, deadlock surely
> appears. In general, global snaphots are not needed for such
> multimaster-like setup -- either there are no conflicts and we are
> fine, or there is a conflict, in which case we get a deadlock.
Can we do something with this deadlock? Will placing an upper limit on time of InDoubt state fix the issue? I understand that aborting automatically is kind of dangerous...

Also, currently hanging 2pc transaction can cause a lot of headache for DBA. Can we have some kind of protection for the case when one node is gone permanently during transaction?

Thanks!

Best regards, Andrey Borodin.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-09-24 09:19:06 Re: Pluggable Storage - Andres's take
Previous Message Chris Travers 2018-09-24 08:15:18 Re: Proposal for Signal Detection Refactoring