Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: 'PostgreSQL-development' <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Sharing more infrastructure between walsenders and regular backends (was Re: Switching timeline over streaming replication)
Date: 2012-10-04 15:09:49
Message-ID: 506DA6BD.9020809@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03.10.2012 18:15, Amit Kapila wrote:
> 35.WalSenderMain(void)
> {
> ..
> + if (walsender_shutdown_requested)
> + ereport(FATAL,
> + (errcode(ERRCODE_ADMIN_SHUTDOWN),
> + errmsg("terminating replication
> connection due to administrator command")));
> +
> + /* Tell the client that we are ready to receive commands */
>
> + ReadyForQuery(DestRemote);
> +
> ..
>
> + if (walsender_shutdown_requested)
> + ereport(FATAL,
> + (errcode(ERRCODE_ADMIN_SHUTDOWN),
> + errmsg("terminating replication
> connection due to administrator command")));
> +
>
> is it necessary to check walsender_shutdown_requested 2 times in a
> loop, if yes, then can we write comment why it is important to check
> it again.

The idea was to check for shutdown request before and after the
pq_getbyte() call, because that can block for a long time.

Looking closer, we don't currently (ie. without this patch) make any
effort to react to SIGTERM in a walsender, while it's waiting for a
command from the client. After starting replication, it does check
walsender_shutdown_requested in the loop, and it's also checked during a
base backup (although only when switching to send next file, which seems
too seldom). This issue is orthogonal to handling timeline changes over
streaming replication, although that patch will make it more important
to handle SIGTERM quickly while waiting for a command, because you stay
in that mode for longer and more often.

I think walsender needs to share more infrastructure with regular
backends to handle this better. When we first implemented streaming
replication in 9.0, it made sense to implement just the bare minimum
needed to accept the handshake commands before entering the Copy state,
but now that the replication command set has grown to cover base
backups, and fetching timelines with the patch being discussed, we
should bite the bullet and make the command loop more feature-complete
and robust.

In a regular backend, the command loop reacts to SIGTERM immediately,
setting ImmediateInterruptOK at the right places, and calling
CHECK_FOR_INTERRUPTS() at strategic places. I propose that we let
PostgresMain handle the main command loop for walsender processes too,
like it does for regular backends, and use ProcDiePending and the
regular die() signal handler for SIGTERM in walsender as well.

So I propose the attached patch. I made small changes to postgres.c to
make it call exec_replication_command() instead of exec_simple_query(),
and reject extend query protocol, in a WAL sender process. A lot of code
related to handling the main command loop and signals is removed from
walsender.c.

- Heikki

Attachment Content-Type Size
use-main-command-loop-in-walsender-1.patch text/x-diff 21.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gaetano Mendola 2012-10-04 15:36:19 hackers newsgroup hacked ?
Previous Message Tom Lane 2012-10-04 14:43:15 Re: [PATCH] Make pg_basebackup configure and start standby [Review]