Re: I'd like to discuss scaleout at PGCon

From: MauMau <maumau307(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: I'd like to discuss scaleout at PGCon
Date: 2018-05-31 12:12:32
Message-ID: CALO4oLONKWhoCLcGHTEWUmW46nsa3vM1K6gLBVqtdznX+ceErw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2018-05-31 11:26 GMT+09:00, Robert Haas <robertmhaas(at)gmail(dot)com>:
> It was nice to meet you in person.

Me too. And it was very kind of you to help me to display the wiki
page well and guide the session. When I first heard your voice at the
Developer Meeting, I thought Bruce Momjian was speaking, because your
voice sounded similar to him...

> We didn't have time in the unconference session to discuss these
> topics in detail, for you have raised many issues here each of which
> deserves discussion individually and in detail. I wrote a blog post
> somewhat related to this topic recently which you can find at
> http://rhaas.blogspot.com/2018/05/built-in-sharding-for-postgresql.html

Yes, I read this article before PGCon. Your articles are always
helpful to catch the current overall situation of the community.

> In
> terms of high-level architecture, I think you are right to wonder
> about the possibility of a cloud-native mode based on separating
> storage and compute. Amazon Aurora offers that feature, but we might
> want to have it in PostgreSQL.

> Another, somewhat different thing that we might want is a fully
> distributed database, with a distributed buffer cache, distributed
> lock manager, distributed invalidation queue, etc. That would be like
> what Oracle RAC does, but it's a tremendous amount of work, and a lot
> of that work has no value by itself. You don't get the payoff until
> it's all working. There are a few pieces that might be independently
> useful, though, like a distributed deadlock detector. The goal of
> this kind of effort is, I think, to support zillions of connections --
> scale further than you can with just one node. This would be a lot
> better if combined with the cloud-native storage, but of course that's
> even more work.

Yes, I can understand the difficulty. So, I simply wanted to ask
opinions at the unconference on which (hard) direction the community
wants to go and what database we want PostgreSQL to be like
ultimately. Without that fundamental consensus, development work
might be wasted, facing objections after submitting patches. As you
mentioned in your blog post and in a past email, I don't think anyone
yet has a clear image of what the scaleout of PostgreSQL should be.
How should we proceed? Which approach should we take to minimize
rework?

a) Define functional specification with the basic overall architecture
(this doesn't mean to write a heavy detailed design document or
manual; I think a general README or wiki would be sufficient.) At
this time, I expect we can evaluate how to use scaleout feature and
whether it's reasonably easy to use. Then we can proceed to design
and code each part with relief -- 2PC, global consistency, failure
detection and failover, distributed lock management and deadlock
handling, etc.

b) Various developers design and code each part, bring together those
patches, and then all try to figure out how to combine them.

I'm in favor of a) at least at the basic architecture level.
Otherwise, such an unhappiness could happen:
"Hey, I made a patch to implement a distributed cache management like
Oracle Cache Fusion."
"No, we don't want features based on shared everything architecture."

I anticipated a decision process at the unconference like this:
"Do we want to build on shared everything architecture?"
"No, because it limits scalability, requires expensive shared storage,
and it won't run on many clouds."
"Then do we want to employ a new architecture like AWS Aurora?"
"That may be interesting. But AWS could do it because they have an
infinitely scalable storage layer which is built and used for a long
time. This architecture may not be our option. But let's keep our
eye on leveraging services of major cloud vendors just like Vertica
does recently. Cloud services are now like traditional
vendor-specific hardware. Maybe PostgreSQL should utilize them just
like we use CPU-specific instructions now and GPU/persistent memory in
the near future."
"Then, it seems that we should go on the shared nothing architecture.
Is it OK?"
"Yes."

> The FDW approach, of which I have been a supporter for some years now,
> is really aiming at a different target, which is to allow efficient
> analytics queries across a multi-node cluster.

Oh, I didn't know you support FDW approach mainly for analytics. I
guessed the first target was OLTP read-write scalability.

> We might not want to confine ourselves strictly to the FDW interface
> -- for example, I've thought about the idea of building introducing a
> new relkind for a "distributed table". A distributed table may be
> present on the local node, in which case it can be scanned like a
> local table, or it may be not present, in which case it can be scanned
> like a foreign table by connecting to a node on which it is present.
> The set of nodes on which a table is present is metadata that is
> shared throughout the cluster. Multi-master logical replication
> propagates changes between all nodes on which the table is present.
> With a concept like this, there is a lot of opportunity to optimize
> queries by, for example, deciding to perform a join on a node on which
> both input tables are present, to minimize data movement.

I agree. XL, Oracle Sharding, and possibly MySQL Cluster does that,
too. It seems like a must-do thing.

> But even if
> we create something like this, I see it as fundamentally an extension
> of the FDW approach that would hopefully manage to reuse a lot of
> what's already been built there. I don't think we need to (or should)
> throw away the work that's been done on FDW pushdown and start over --
> we should find a way to build on top of it and add ease-of-use and
> management features.

Agreed. I think we should not write much code from scratch, too. On
the other hand, if we have to support sharding natively without FDW, I
wonder if we can reuse the FDW artifact. I mean, extracting necessary
logics from FDW into common functions, and native sharding code also
calls them.

> In fact, even if we said that we want a fully distributed database,
> we'd probably still need some kind of distributed table concept.
> Unless every node has a full copy of everything in the database, you
> still need to reason about which data is present on which nodes and
> optimize queries accordingly.

Then, how about building the cluster membership management first,
including node management and failure detection/failover? I think
that node management is necessary anyway, and other developers can
experiment other things on that cluster infrastructure. Do you think
it would be helpful or wasteful? I'm finding what we can do for early
scaleout release.

> From the chart view, in February 2016, SQL Server was at 1150.227, and
> MongoDB was at 305.599. Generally it looks like the "big three" --

Thank you for looking at the chart and telling me the figures.

> I
> think it's pretty clear that we need to both continue to improve some
> of these major new features we've added and at the same time keep
> introducing even more new things if we want to continue to gain market
> share and mind share. I hope that features like scale-out and also
> zheap are going to help us continue to whittle away at the gap, and I
> look forward to seeing what else anyone may have in mind.

Definitely. I couldn't agree more.

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message REIX, Tony 2018-05-31 12:28:37 RE:PostgreSQL 11 beta1 on AIX 7.2 : 2 failures in 32bit mode
Previous Message Peter Eisentraut 2018-05-31 12:01:10 Re: We still claim "cannot begin/end transactions in PL/pgSQL"