| From: | Arseny Sher <a(dot)sher(at)postgrespro(dot)ru> | 
|---|---|
| To: | Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru> | 
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Global snapshots | 
| Date: | 2018-07-25 11:35:02 | 
| Message-ID: | 874lgna6qh.fsf@ars-thinkpad | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hello,
I have looked through the patches and found them pretty accurate. I'd
fixed a lot of small issues here and there; updated patchset is
attached. But first, some high-level notes:
 * I agree that it would be cool to implement functionality like current
   "snapshot too old": that is, abort transaction with old global
   snapshot only if it really attempted to touch modified data.
 * I also agree with Stas that any attempts to trade oldestxmin in
   gossip between the nodes would drastically complicate this patch and
   make it discussion-prone; it would be nice first to get some feedback
   on general approach, especially from people trying to distribute
   Postgres.
 * One drawback of these patches is that only REPEATABLE READ is
   supported. For READ COMMITTED, we must export every new snapshot
   generated on coordinator to all nodes, which is fairly easy to
   do. SERIALIZABLE will definitely require chattering between nodes,
   but that's much less demanded isolevel (e.g. we still don't support
   it on replicas).
 * Another somewhat serious issue is that there is a risk of recency
   guarantee violation. If client starts transaction at node with
   lagging clocks, its snapshot might not include some recently
   committed transactions; if client works with different nodes, she
   might not even see her own changes. CockroachDB describes at [1] how
   they and Google Spanner overcome this problem. In short, both set
   hard limit on maximum allowed clock skew.  Spanner uses atomic
   clocks, so this skew is small and they just wait it at the end of
   each transaction before acknowledging the client. In CockroachDB, if
   tuple is not visible but we are unsure whether it is truly invisible
   or it's just the skew (the difference between snapshot and tuple's
   csn is less than the skew), transaction is restarted with advanced
   snapshot. This process is not infinite because the upper border
   (initial snapshot + max skew) stays the same; this is correct as we
   just want to ensure that our xact sees all the committed ones before
   it started. We can implement the same thing.
Now, the description of my mostly cosmetical changes:
 * Don't ERROR in BroadcastStmt to allow us to handle failure manually;
 * Check global_snapshot_defer_time in ImportGlobalSnapshot instead of
   falling on assert;
 * (Arguably) improved comments around locking at circular buffer
   maintenance; also, don't lock procarray during global_snapshot_xmin
   bump.
 * s/snaphot/snapshot, other typos.
 * Don't track_global_snapshots by default -- while handy for testing, it
   doesn't look generally good.
 * Set track_global_snapshots = true in tests everywhere.
 * GUC renamed from postgres_fdw.use_tsdtm to
   postgres_fdw.use_global_snapshots for consistency.
 * 003_bank_shared.pl test is removed. In current shape (loading one
   node) it is useless, and if we bombard both nodes, deadlock surely
   appears. In general, global snaphots are not needed for such
   multimaster-like setup -- either there are no conflicts and we are
   fine, or there is a conflict, in which case we get a deadlock.
 * Fix initdb failure with non-zero global_snapshot_defer_time.
 * Enforce REPEATABLE READ since currently we export snap only once in
   xact.
 * Remove assertion that circular buffer entries are monotonic, as
   GetOldestXmin *can* go backwards.
[1] https://www.cockroachlabs.com/blog/living-without-atomic-clocks/
--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
| Attachment | Content-Type | Size | 
|---|---|---|
| 0001-GlobalCSNLog-SLRU-v2.patch | text/x-diff | 24.0 KB | 
| 0002-Global-snapshots-v2.patch | text/x-diff | 64.6 KB | 
| 0003-postgres_fdw-support-for-global-snapshots-v2.patch | text/x-diff | 32.0 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrey Klychkov | 2018-07-25 11:46:14 | Fwd: Re[2]: Alter index rename concurrently to | 
| Previous Message | Sergei Kornilov | 2018-07-25 10:42:10 | Re: Online enabling of checksums |