Re: The plan for FDW-based sharding

From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The plan for FDW-based sharding
Date: 2016-03-05 15:41:56
Message-ID: CACjxUsMNunOMtpV0x0xsac6txR0e4AE2D2YMEudASy=DTGy_Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 4, 2016 at 10:10 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 28 February 2016 at 06:38, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:

>> What I sketched out with the "apparent order of execution"
>> ordering of the transactions (basically, commit order except
>> when one SERIALIZABLE transaction needs to be dragged in front
>> of another due to a read-write dependency) is possibly the
>> simplest approach, but batching may well give better
>> performance.
>
> I'd be really interested in some ideas on how that information might be
> usefully accessed. If we could write info on when to apply commits to the
> xlog in serializable mode that'd be very handy, especially when looking to
> the future with logical decoding of in-progress transactions, parallel
> apply, etc.

Are you suggesting the possibility of holding off on writing the
commit record for a SERIALIZABLE transaction to WAL until it is
known that no other SERIALIZABLE transaction comes ahead of it in
the apparent order of execution? If so, that's an interesting idea
that I hadn't given much thought to yet -- I had been assuming
current WAL writes, with adjustments to the timing of application
of the records.

> For parallel apply I anticipated that we'd probably have workers applying
> xacts in parallel and committing them in upstream commit order. They'd
> sometimes deadlock with each other; when this happened all workers whose
> xacts committed after the first aborted xact would have to abort and start
> again. Not ideal, but safe.
>
> Being able to avoid that by using SSI information was in the back of my
> mind, but with no idea how to even begin to tackle it. What you've mentioned
> here is helpful and I'd be interested if you could share a bit more of your
> experience in the area.

My thinking so far has been that reordering the application of
transaction commits on a replica would best be done as the minimal
rearrangement possible from commit order which allows the work of
transactions to become visible in an order consistent with some
one-at-a-time run of those transactions. Partly that is because
the commit order is something that is fairly obvious to see and is
what most people intuitively look at, even when it is wrong.
Deviating from this intuitive order seems likely to introduce
confusion, even when the results are 100% correct.

The only place you *need* to vary from commit order for correctness
is when there are overlapping SERIALIZABLE transactions, one
modifies data and commits, and another reads the old version of the
data but commits later. Due to the action of SSI on the source
machine, you know that there could not be any SERIALIZABLE
transaction which saw the inconsistent state between the two
commits, but on replicas we don't yet manage that. The key is that
there is a read-write dependency (a/k/a rw-conflict) between the
two transactions which tells you that the second to commit has to
come before the first in any graph of apparent order of execution.

The tricky part is that when there are two overlapping SERIALIZABLE
transactions and one of them has modified data and committed, and
there is an overlapping SERIALIZABLE transaction which is not READ
ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
correct ordering remains in doubt -- there is no way to know which
might need to commit first, or whether it even matters. I am
skeptical about whether in logical replication (including MMR), it
is going to be possible to manage this by finding "safe snapshots".
The only alternative I can see, though, is to suspend replication
while correct transaction ordering remains in doubt. A big READ
ONLY transaction would not cause a replication stall, but a big
READ WRITE transaction could cause an indefinite stall. Simon
seemed to be saying that this is unacceptable, but I tend to think
it is a viable approach for some workloads, especially if the READ
ONLY transaction property is used when possible.

There might be some wiggle room in terms of letting
non-SERIALIZABLE transactions commit while the ordering of
SERIALIZABLE transactions remain in doubt, but that would involve
allowing bigger deviations from commit order in transaction
application, which may confuse people. The argument on the other
side is that if they use transaction isolation less strict than
SERIALIZABLE that they are vulnerable to seeing anomalies anyway,
so they must be OK with that.

Hopefully this is in some way helpful....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2016-03-05 15:56:01 Re: JPUG wants to have a copyright notice on the translated doc
Previous Message Greg Stark 2016-03-05 14:40:40 Re: Static code checker research worth investigating (Communications of the ACM, 03/2016, Vol. 59, No. 03, p. 99)