Re: The plan for FDW-based sharding

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh berkus <josh(at)agliodbs(dot)com>
Cc: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: The plan for FDW-based sharding
Date: 2016-03-11 08:09:02
Message-ID: 20160311080902.GA16342@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have read the recent comments on this thread with great interest. I
am glad people have expressed their concerns, rather than remain silent.
Now that the responses have decreased, I can reply.

I saw several concerns:

1. My motivation for starting this thread was to decrease interest in
external sharding solutions.

2. No prototype was produced.

3. More work needs to be done to encourage others to be involved.

4. An FDW-based sharding solution will only work for some workloads,
decreasing interest in a more general solution.

5. I started this thread to take credit for the idea or feature.

Let me reply to each item as briefly as I can:

1. I said good things about external sharding solutions in the email,
so it is hard to logically argue that the _intent_ was to reduce
interest in them. I will admit that that might be the short-term
effect.

2. We have not produced a prototype because we don't really need to
make any decision yet on viability. We already need to improve FDW
pushdown, partitioning syntax, and perhaps a global transaction/snapshot
manger with or without sharding, so we might as well just make those
improvements, and then producing a prototype will be much easier and
more representative.

3. I have tried to encourage others to get involved, with limited
success. I do think the FDW is perhaps the only reasonable way to get
_built-in_ sharding. The external sharding solutions are certainly
viable, but external. It is possible we will make all the FDW
improvements, find out it doesn't work, but find out the improvements
allow us to go in another direction.

4. Hard to argue with #4. We got partitioning working with a complex
API that has not improved much over the years. I think this will be
cleaned up with the FDW-sharding work, and it would be a shame to create
another partial solution (FDW sharding) out of that work.

5. See below on why I talk about these things.

There seems to be serious interest in how this idea came about, so let
me say what I remember. It is very possible others came to the same
conclusions independently, and earlier. I think I first heard it form
Korry Douglas in an EDB-internal discussion. I then heard it from Josh
Berkus or we discussed it at a conference. That got me thinking, and
then an EDB customer talked about the need for multi-node write scaling,
and I realized that only sharding could do that. (The data warehouse
use of sharding was already clear to me.) I then understood the wisdom
of Postgres XC, which NTT worked on for perhaps a decade. (I just left
their offices here in Tokyo.) I discussed the FDW-sharding idea
internally inside EDB, and then mentioned it during a visit to NTT in
July, 2014. I wrote and blogged about a new sharding presentation I
wrote in February, 2015
(http://momjian.us/main/blogs/pgblog/2015.html#February_1_2015). I
presented the talk in three locations in 2015.

The reason I talk about these things (#5) is because I am trying to
encourage people to work on them, and I want to communicate to our users
that we realize sharding is important for certain workloads and that we
are attempting a built-in solution. Frankly, I don't think many users
need sharding, but many users want to know it is available, so I think
it is important to talk about it.

As for why there is so much hostility, I think this is typical for any
ill-defined feature development. There was simmering hostility to the
Windows port and pg_upgrade for many years because those projects were
not easy to define and risky, and had few active developers. The
agreement was that work could continue as long as destabilization wasn't
introduced. Ideally everything would have a well-defined plan, it is
sometimes hard to do. Similar to our approach on parallelism (which is
also super-important and doesn't many active developers), sometimes you
just need to create infrastructure and see how well it solves problems.

The weird thing is that if you do implement an ill-defined feature,
there really isn't much huge positive feedback --- people just use the
feature, and the complaints stop.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2016-03-11 08:19:01 Logical decoding slots can go backwards when used from SQL, docs are wrong
Previous Message Vladimir Sitnikov 2016-03-11 08:00:05 Re: Proposal: RETURNING primary_key()