Re: Logical Replication WIP

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Steve Singer <steve(at)ssinger(dot)info>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Logical Replication WIP
Date: 2017-01-15 22:20:53
Message-ID: 2c297716-ac22-ba5c-24ae-7435c0ba85f2@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

finally got to this (multiple emails squashed into one).

On 04/01/17 18:46, Peter Eisentraut wrote:
> Some small patches for 0002-Add-SUBSCRIPTION-catalog-and-DDL-v16.patch.gz:
>

Merged thanks.

> In CreateSubscription(), I don't think we should connect to the remote
> if no slot creation is requested. Arguably, the point of that option is
> to not make network connections. (That is what my documentation patch
> above claims, in any case.)
>

Agreed and done.

> I don't know why we need to check the PostgreSQL version number of the
> remote. We should rely on the protocol version number, and we should
> just make it work. When PG 11 comes around, subscribing from PG 10 to a
> publisher on PG 11 should just work without any warnings, IMO.
>

Also agreed and removed.

> 003-Define-logical-replication-protocol-and-output-plugi-v16.patch.gz
> looks good now, documentation is clear now.
>
> Another fixup patch to remove excessive includes.

Thanks merged.

> Comments on 0004-Add-logical-replication-workers-v16.patch.gz:
>
> I didn't find any major problems. At times while I was testing strange
> things it was not clear why "nothing is happening". I'll do some more
> checking in that direction.
>
> Fixup patch attached that enhances some error messages, fixes some
> typos, and other minor changes. See also comments below.
>

Merged.

>
> The way check_max_logical_replication_workers() is implemented creates
> potential ordering dependencies in postgresql.conf. For example,
>
> max_logical_replication_workers = 100
> max_worker_processes = 200
>
> fails, but if you change the order, it works. The existing
> check_max_worker_processes() has the same problem, but I suspect because
> it only checks against MAX_BACKENDS, nobody has ever seriously hit that
> limit.
>
> I suggest just removing the check. If you set
> max_logical_replication_workers higher than max_worker_processes and you
> hit the lower limit, then whatever is controlling max_worker_processes
> should complain with its own error message.
>

Good point, removed.

>
> The default for max_logical_replication_workers is 4, which seems very
> little. Maybe it should be more like 10 or 20. The "Quick setup"
> section recommends changing it to 10. We should at least be
> consistent there: If you set a default value that is not 0, then it
> should enough that we don't need to change it again in the Quick
> setup. (Maybe the default max_worker_processes should also be
> raised?)

Well, it's 4 because max_worker_processes is 8, I think default
max_worker_processes should be higher than
max_logical_replication_workers so that's why I picked 4. If we are okay
wit bumping the max_worker_processes a bit, I am all for increasing
max_logical_replication_workers as well.

The quick setup mentions 10 mainly for consistency with slots and wal
senders (those IMHO should also not be 0 by default at this point...).

>
> +max_logical_replication_workers = 10 # one per subscription + one per
> instance needed on subscriber
>
> I think this is incorrect (copied from max_worker_processes?). The
> launcher does not count as one of the workers here.
>
> On a related note, should the minimum not be 0 instead of 1?
>

Eh, yes.

>
> About the changes to libpqrcv_startstreaming(). The timeline is not
> really an option in the syntax. Just passing in a string that is
> pasted in the final command creates too much coupling, I think. I
> would keep the old timeline (TimeLineID tli) argument, and make the
> options const char * [], and let startstreaming() assemble the final
> string, including commas and parentheses. It's still not a perfect
> abstraction, because you need to do the quoting yourself, but much
> better. (Alternatively, get rid of the startstreaming call and just
> have callers use libpqrcv_PQexec directly.)
>

I did this somewhat differently, with struct that defines options and
has different union members for physical and logical replication. What
do you think of that?

>
> Some of the header files are named inconsistently with their .c files.
> I think src/include/replication/logicalworker.h should be split into
> logicalapply.h and logicallauncher.h.

Okay.

> Not sure about
> worker_internal.h. Maybe rename apply.c to worker.c?
>

Hmm I did that, seems reasonably okay. Original patch in fact had both
worker.c and apply.c and I eventually moved the worker.c functions to
either apply.c or launcher.c.

> (I'm also not fond of throwing publicationcmds.h and
> subscriptioncmds.h together into replicationcmds.h. Maybe that could
> be changed, too)

Okay.

>
> Various FATAL errors in logical/relation.c when the target relation is
> not in the right state. Could those not be ERRORs? The behavior is
> the same at the moment because background workers terminate on
> uncaught exceptions, but that should eventually be improved.
>

Seems like you changed this in your patch. I don't have any objections.

>
> In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in
> the code using this struct. (Maybe reloid -> localreloid)
>

Okay.

>
> Partitioned tables are not supported in either publications or as
> replication targets. This is expected but should be fixed before the
> final release.
>

Yes, that will need some discussion about corner case behaviour. For
example, have partitioned table 'foo' which is in publication, then you
have table 'bar' which is not in publication, you attach it to the
partitioned table 'foo', should it automatically be added to
publication? Then you detach it, should it then be removed from publication?
What if 'bar' was in publication before it was attached/detached to/from
'foo'? What if 'foo' wasn't in publication but 'bar' was? Should we
allow ONLY syntax for partitioned table when they are being added and
removed?

Sadly current partitioning section of the docs doesn't provide any
guidance in terms of precedents for other actions here as it still
speaks about using inheritance and check constraints directly instead of
the new feature.

My proposal would be to let partitions to be added/removed to/from
publications normally (as they are now) and have them also check if
parent table is published in case they aren't (ie, if partitioned table
is in some publications, all partitions are implicitly as well without
adding them to the pg_publication_rel catalog, but they also keep their
own membership in publications as well individually there). That would
mean we don't allow ONLY syntax for partitioned tables. One scenario
where I am on the fence is what should happen here if we do ALTER
PUBLICATION ... DROP TABLE partitioned_table in case that
partitioned_table also contains partition which was explicitly added to
the publication, should it keep its own membership or should it be
removed? Maybe we could allow the ONLY clause only for DROP but not for ADD?

>
> In apply.c:
>
> The comment in apply_handle_relation() makes a point that the schema
> validation is done later, but does not tell why. The answer is
> probably because it doesn't matter and it's more convenient, but it
> should be explained in the comment.

Yes I noticed, I tried to explain.

>
> See XXX comment in logicalrep_worker_stop().

Yes that was a good point.

>
> The get_flush_position() return value is not intuitive from the
> function name. Maybe make that another pointer argument for clarity.

Okay.

> reread_subscription() complains if the subscription name was changed.
> I don't know why that is a problem.

Because we don't have ALTER SUBSCRIPTION RENAME currently. Maybe should
be Assert?

>
> In launcher.c:
>
> pg_stat_get_subscription should hold LogicalRepWorkerLock around the
> whole loop, so that it doesn't get inconsistent results when workers
> change during the loop.
>

Done.

> In relation.c:
>
> Inconsistent use of uint32 vs LogicalRepRelId. Pick one.

Done.

Attached is new version with your changes merged and above suggestions
applied. It still does not support partitioned tables and does the
filtering using FirstNormalObjectId.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Add-PUBLICATION-catalogs-and-DDL-v18.patch.gz application/gzip 33.9 KB
0002-Add-SUBSCRIPTION-catalog-and-DDL-v18.patch.gz application/gzip 29.4 KB
0003-Define-logical-replication-protocol-and-output-plugi-v18.patch.gz application/gzip 12.6 KB
0004-Add-logical-replication-workers-v18.patch.gz application/gzip 44.9 KB
0005-Add-separate-synchronous-commit-control-for-logical--v18.patch.gz application/gzip 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2017-01-15 22:57:49 Re: Logical Replication WIP
Previous Message Artur Zakirov 2017-01-15 22:15:28 Re: [PATCH] Generic type subscription