Re: Logical Replication WIP

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Steve Singer <steve(at)ssinger(dot)info>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org
Subject: Re: Logical Replication WIP
Date: 2017-01-06 20:18:04
Message-ID: 245f59d7-62d2-6ae3-6650-c2b4df51dd33@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Comments on 0004-Add-logical-replication-workers-v16.patch.gz:

I didn't find any major problems. At times while I was testing strange
things it was not clear why "nothing is happening". I'll do some more
checking in that direction.

Fixup patch attached that enhances some error messages, fixes some
typos, and other minor changes. See also comments below.

---

The way check_max_logical_replication_workers() is implemented creates
potential ordering dependencies in postgresql.conf. For example,

max_logical_replication_workers = 100
max_worker_processes = 200

fails, but if you change the order, it works. The existing
check_max_worker_processes() has the same problem, but I suspect because
it only checks against MAX_BACKENDS, nobody has ever seriously hit that
limit.

I suggest just removing the check. If you set
max_logical_replication_workers higher than max_worker_processes and you
hit the lower limit, then whatever is controlling max_worker_processes
should complain with its own error message.

---

The default for max_logical_replication_workers is 4, which seems very
little. Maybe it should be more like 10 or 20. The "Quick setup"
section recommends changing it to 10. We should at least be
consistent there: If you set a default value that is not 0, then it
should enough that we don't need to change it again in the Quick
setup. (Maybe the default max_worker_processes should also be
raised?)

+max_logical_replication_workers = 10 # one per subscription + one per
instance needed on subscriber

I think this is incorrect (copied from max_worker_processes?). The
launcher does not count as one of the workers here.

On a related note, should the minimum not be 0 instead of 1?

---

About the changes to libpqrcv_startstreaming(). The timeline is not
really an option in the syntax. Just passing in a string that is
pasted in the final command creates too much coupling, I think. I
would keep the old timeline (TimeLineID tli) argument, and make the
options const char * [], and let startstreaming() assemble the final
string, including commas and parentheses. It's still not a perfect
abstraction, because you need to do the quoting yourself, but much
better. (Alternatively, get rid of the startstreaming call and just
have callers use libpqrcv_PQexec directly.)

---

Some of the header files are named inconsistently with their .c files.
I think src/include/replication/logicalworker.h should be split into
logicalapply.h and logicallauncher.h. Not sure about
worker_internal.h. Maybe rename apply.c to worker.c?

(I'm also not fond of throwing publicationcmds.h and
subscriptioncmds.h together into replicationcmds.h. Maybe that could
be changed, too.)

---

Various FATAL errors in logical/relation.c when the target relation is
not in the right state. Could those not be ERRORs? The behavior is
the same at the moment because background workers terminate on
uncaught exceptions, but that should eventually be improved.

A FATAL error will lead to a

LOG: unexpected EOF on standby connection

on the publisher, because the process just dies without protocol
shutdown. (And then it reconnects and tries again. So we might as
well not die and just retry again.)

---

In LogicalRepRelMapEntry, rename rel to localrel, so it's clearer in
the code using this struct. (Maybe reloid -> localreloid)

---

Partitioned tables are not supported in either publications or as
replication targets. This is expected but should be fixed before the
final release.

---

In apply.c:

The comment in apply_handle_relation() makes a point that the schema
validation is done later, but does not tell why. The answer is
probably because it doesn't matter and it's more convenient, but it
should be explained in the comment.

See XXX comment in logicalrep_worker_stop().

The get_flush_position() return value is not intuitive from the
function name. Maybe make that another pointer argument for clarity.

reread_subscription() complains if the subscription name was changed.
I don't know why that is a problem.

---

In launcher.c:

pg_stat_get_subscription should hold LogicalRepWorkerLock around the
whole loop, so that it doesn't get inconsistent results when workers
change during the loop.

---

In relation.c:

Inconsistent use of uint32 vs LogicalRepRelId. Pick one. :)

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-fixup-Add-logical-replication-workers.patch text/x-patch 20.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-01-06 20:26:41 Re: Logical Replication WIP
Previous Message Tom Lane 2017-01-06 20:17:10 Re: Increase pltcl test coverage