Re: eXtensible Transaction Manager API

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: eXtensible Transaction Manager API
Date: 2015-12-01 14:19:19
Message-ID: 20151201141919.GA1297@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 17, 2015 at 12:48:38PM -0500, Robert Haas wrote:
> > At this time, the number of round trips needed particularly for READ
> > COMMITTED transactions that need a new snapshot for each query was
> > really a performance killer. We used DBT-1 (TPC-W) which is less
> > OLTP-like than DBT-2 (TPC-C), still with DBT-1 the scalability limit
> > was quickly reached with 10-20 nodes..
>
> Yeah. I think this merits a good bit of thought. Superficially, at
> least, it seems that every time you need a snapshot - which in the
> case of READ COMMITTED is for every SQL statement - you need a network
> roundtrip to the snapshot server. If multiple backends request a
> snapshot in very quick succession, you might be able to do a sort of
> "group commit" thing where you send a single request to the server and
> they all use the resulting snapshot, but it seems hard to get very far
> with such optimization. For example, if backend 1 sends a snapshot
> request and backend 2 then realizes that it also needs a snapshot, it
> can't just wait for the reply from backend 1 and use that one. The
> user might have committed a transaction someplace else and then kicked
> off a transaction on backend 2 afterward, expecting it to see the work
> committed earlier. But the snapshot returned to backend 1 might have
> been taken before that. So, all in all, this seems rather crippling.
>
> Things are better if the system has a single coordinator node that is
> also the arbiter of commits and snapshots. Then, it can always take a
> snapshot locally with no network roundtrip, and when it reaches out to
> a shard, it can pass along the snapshot information with the SQL query
> (or query plan) it has to send anyway. But then the single
> coordinator seems like it becomes a bottleneck. As soon as you have
> multiple coordinators, one of them has got to be the arbiter of global
> ordering, and now all of the other coordinators have to talk to it
> constantly.

I think the performance benefits of having a single coordinator are
going to require us to implement different snapshot/transaction code
paths for single coordinator and multi-coordinator cases. :-( That is,
people who can get by with only a single coordinator are going to want
to do that to avoid the overhead of multiple coordinators, while those
using multiple coordinators are going to have to live with that
overhead.

I think an open question is what workloads can use a single coordinator?
Read-only? Long queries? Are those cases also ones where the
snapshot/transaction overhead is negligible, meaning we don't need the
single coordinator optimizations?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-12-01 14:27:10 Re: Use pg_rewind when target timeline was switched
Previous Message Artur Zakirov 2015-12-01 13:54:37 Re: [PROPOSAL] Improvements of Hunspell dictionaries support