Re: A Modest Upgrade Proposal

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, David Fetter <david(at)fetter(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Subject: Re: A Modest Upgrade Proposal
Date: 2016-07-13 19:26:23
Message-ID: CA+TgmoZzf__YWt9qnOw7jAKUuydPRKLPE0cic7tJb7F_qzYrxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 8, 2016 at 5:47 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> DDL is our standard way of getting things into the system catalogs.
>> We have no system catalog metadata that is intended to be populated by
>> any means other than DDL.
>
> Replication slots? (Arguably not catalogs, I guess)
>
> Replication origins?

Those things aren't catalogs, are they? I mean, as I said in the
other email I just sent in reply to Simon, if you did a pg_dump and a
pg_restore, I don't think it would be useful to preserve replication
slot LSNs afterwards. If I'm wrong, and that is a useful thing to do,
then we should have a pg_dump flag to do it. Either way, I think we
do have some work to do figuring out how you can dump, restore, and
then resume logical replication, probably by establishing a new slot
and then incrementally resynchronizing without having to copy
unchanged rows.

That having been said, I think the choice not to use DDL for slots was
somewhat unfortunate. We now have CREATE_REPLICATION_SLOT that can be
used via the replication protocol but there is no corresponding CREATE
REPLICATION SLOT for the regular protocol; I think that's kinda
strange.

>> If you want to add a column to a table, you
>> say ALTER TABLE .. ADD COLUMN. If you want to add a column to an
>> extension, you say ALTER EXTENSION .. ADD TABLE. If you want to add
>> an option to a foreign table, you say ALTER FOREIGN TABLE .. OPTIONS
>> (ADD ..). Therefore, I think it is entirely reasonable and obviously
>> consistent with existing practice that if you want to add a table to a
>> replication set, you should write ALTER REPLICATION SET .. ADD TABLE.
>> I don't understand why logical replication should be the one feature
>> that departs from the way that all of our other features work.
>
> Because unlike all the other features, it can work usefully *across
> versions*.

So what?

> We have no extension points for DDL.
>
> For function interfaces, we do.
>
> That, alone, makes a function based interface overwhelmingly compelling
> unless there are specific things we *cannot reasonably do* without DDL.

I don't understand this. We add new DDL in new releases, and we avoid
changing the meaning existing of DDL. Using function interfaces won't
make it possible to change the meaning of existing syntax, and it
won't make it any more possible to add new syntax. It will just make
replication commands be spelled differently from everything else.

> In many cases it's actively undesirable to dump and restore logical
> replication state. Most, I'd say. There probably are cases where it's
> desirable to retain logical replication state such that restoring a dump
> resumes replication, but I challenge you to come up with any sensible and
> sane way that can actually be implemented. Especially since you must
> obviously consider the possibility of both upstream and downstream being
> restored from dumps.

Yes, these issues need lots of thought, but I think that replication
set definitions, at least, are sensible to dump and reload.

> IMO the problem mostly devolves to making sure dumps taken of different DBs
> are consistent so new replication sessions can be established safely. And
> really, I think it's a separate feature to logical replication its self.

I think what is needed has more to do with coping with the situation
when the snapshots aren't consistent. Having a way to make sure they
are consistent is a great idea, but there WILL be situations when
replication between two 10TB databases gets broken and it will not be
good if the only way to recover is to reclone.

> To what extent are you approaching this from the PoV of wanting to use this
> in FDW sharding? It's unclear what vision for users you have behind the
> things you say must be done, and I'd like to try to move to more concrete
> ground. You want DDL? OK, what should it look like? What does it add over a
> function based interface? What's cluster-wide and what's DB-local? etc.

I've thought about that question, a little bit, but it's not really
what underlies my concerns here. I'm concerned about dump-and-restore
preserving as much state as is usefully possible, because I think
that's critical for the user experience, and I'm concerned with having
the commands we use to manage replication not be spelled totally
differently than our other commands.

However, as far as sharding is concerned, no matter how it gets
implemented, I think logical replication is a key feature.
Postgres-XC/XL has the idea of "replicated" tables which are present
on every data node, and that's very important for efficient
implementation of joins. If you do a join between a little table and
a big sharded table, you want to be able to push that down to the
shards, and you can only do that if the entirety of the little table
is present on every shard or by creating a temporary copy on every
shard. In many cases, the former will be preferable. So, think it's
important for sharding that logical replication is fully integrated
into core in such a manner as to be available as a building block for
other features.

At the least, I'm guessing that we'll want a way for whatever code is
planning join execution to figure out which tables have up-to-date
copies on servers that are involved in the query. As far as the
FDW-based approach to sharding is concerned, one thing to think about
is whether postgres_fdw and logical replication could share one notion
of where the remote servers are.

> FWIW, Petr is working on some code in the area, but I don't know how far
> along the work is.

OK, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-07-13 19:28:10 Re: Bug in batch tuplesort memory CLUSTER case (9.6 only)
Previous Message Tom Lane 2016-07-13 19:24:35 Re: Bug in batch tuplesort memory CLUSTER case (9.6 only)