Re: The plan for FDW-based sharding

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: The plan for FDW-based sharding
Date: 2016-02-27 03:54:07
Message-ID: CA+TgmoavrFEU9pXmF+AsuvkW4oS-xBJ25r=kdacyPnxLk5vZDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> We do not have formal prove that proposed XTM is "general enough" to handle
> all possible transaction manager implementations.
> But there are two general ways of dealing with isolation: snapshot based and
> CSN based.

I don't believe that for a minute. For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers. The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area. How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions. The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing. We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation. Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that. Maybe the best one.
Certainly one that's been experimented within Postgres-XC. But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system. So I don't think we should be
assuming that's the way forward.

For example, consider a table with a million rows spread across any
number of servers. Consider also a series of update transactions each
of which reads exactly one row and then writes that row. If we adopt
any solution that involves a central coordinator to arbitrate commit
ordering, this is going to require at least one and probably two
million network round trips, one per transaction to get a snapshot and
a second to commit. But all of this is completely unnecessary.
Because each transaction touches only a single node, a perfect global
transaction manager doesn't really need to do anything at all in this
case. The existing PostreSQL mechanisms - snapshot isolation, and SSI
if you have it turned on - will provide just as much transaction
isolation on this workload as they would on a workload that only
touched a single node. If we design a GTM that does two million
network round trips in this scenario, we have just wasted two million
network round trips.

Now consider another workload where each transaction reads a row one
one server, reads a row on another server, and then updates the second
row. Here, the GTM has a job to do. If T1 reads R1, reads R2, writes
R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
could happen that both transactions see the pre-update values of the
row they read first and yet both transactions go on to commit. That's
not equivalent to any serial history, so transaction isolation is
broken. A GTM which aims to provide true cluster-wide serializability
must do something to keep that from happening. If all of this were
happening on a single node, those transactions would succeed if run at
READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

I have seen zero discussion of any of this. What I think we ought to
be doing here is describing precisely what might break, and then
deciding which of those problems we want to fix, and then deciding how
we can do that with the least amount of network traffic. Jumping to
"let's make the transaction API pluggable" is presupposing the answer
to the first two questions without any discussion, and I'm afraid that
it's not going to lead to a very agreeable solution to the third one.

> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
> But it cause big problems both for developers, which have to permanently
> synchronize their branch with master,
> and, what is more important, for customers, which can not use standard
> version of PostgreSQL.
> It may cause problems with system certification, with running Postgres in
> cloud,...
> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
> wrong direction.

I think the history of Postgres-XC/XL shows that developing technology
outside of the PostgreSQL community is a risky business. You might
end up developing something that is not widely used or adopted, and
the lack of community review might cause that technology to be less
good than it would have been had it been done through the community
process. It seems to me that installing a bunch of hooks here and then
having you go off and develop outside the community has those same
perils. (Of course, in that case and this one, working outside the
community also lets you can also go faster and do things the community
doesn't like, which are sometimes advantages.)

Also, what you are proposing solves problems for you while maybe
creating them for other people. You're saying that we should have
hooks so that you don't have to merge with master. But that's just
transferring the maintenance burden from you to core. Instead of you
having to merge when things change, core has got to maintain the hooks
as things change so that things are easy for you. If there are no
code changes in the relevant area anyway, then merging is trivial and
you shouldn't need to worry about it. I could submit a patch adding
hooks to core to enable all of the things (or even just some of the
things) that EnterpriseDB has changed in Advanced Server, and that
patch would be rejected so fast it would make your head spin, because
of course the core project doesn't want to be burdened with
maintaining a whole bunch of hooks for the convenience of
EnterpriseDB. Which is understandable. I think it's fine for you to
ask whether PostgreSQL will accept a certain set of hooks, but we've
all got to understand that there is a difference between what is
convenient for us or our employers and what is actually best for the
project. I am not under any illusions that those two things are the
same, and while I do a lot of things that I hope will benefit my
employer, when I am writing to this mailing list I do not do things
unless they are in the interest of PostgreSQL. When those two things
intersect, great; when they don't, and the work is community work,
PostgreSQL wins. I see very clearly that what you are proposing here
will benefit your customers, but unless it will also benefit the
PostgreSQL community in general, it's not a good submission.

But I don't really want to spend a lot of time arguing about politics
here. The real issue is whether this is a good approach. If it is,
then it's the right thing to do for PostgreSQL and we should commit
it. If it's not, then we should reject it. Let's focus on the
technical concerns I wrote about in the first part of the email rather
than wrangling about business interests. I'm not blind to the fact
that we work for different companies and I realize that can create
some tension, but if we want to *have* a PostgreSQL community we've
got to try to get past that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-02-27 03:55:18 Re: Performance degradation in commit 6150a1b0
Previous Message Peter Eisentraut 2016-02-27 03:49:34 syslog configurable line splitting behavior