Re: [patch] demote

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Subject: Re: [patch] demote
Date: 2020-07-13 15:04:49
Message-ID: 20200713170449.7a2c58b0@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Another summary + patch + tests.

This patch supports 2PC. The goal is to keep them safe during demote/promote
actions so they can be committed/rollbacked later on a primary. See tests.

The checkpointer is now shutdowned after the demote shutdown checkpoint. It
removes some useless code complexity, eg. avoiding to signal postmaster from
checkpointer to keep going with the demotion.

Cascaded replication is now supported. Wal senders stay actives during
demotion but set their local "am_cascading_walsender = true". It has been a
rough debug session (thank you rr and tests!) on my side, but it might deserve
it. I believe they should stay connected during the demote actions for futur
features, eg. triggering a switchover over the replication protocol using an
admin function.

The first tests has been added in "recovery/t/021_promote-demote.pl". I'll add
some more tests in futur versions.

I believe the patch is ready for some preliminary tests and advice or
directions.

On my todo:

* study how to only disconnect or cancel active RW backends
* ...then add pg_demote() admin function
* cancel running checkpoint for fast demote ?
* user documentation
* Robert's concern about snapshot during hot standby
* some more coding style cleanup/refactoring
* anything else reported to me :)

Thanks,

On Fri, 3 Jul 2020 00:12:10 +0200
Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:

> Hi,
>
> Here is a small activity summary since last report.
>
> On Thu, 25 Jun 2020 19:27:54 +0200
> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote:
> [...]
> > I hadn't time to investigate Robert's concern about shared memory for
> > snapshot during recovery.
>
> I hadn't time to dig very far, but I suppose this might be related to the
> comment in ProcArrayShmemSize(). If I'm right, then it seems the space is
> already allocated as long as hot_standby is enabled. I realize it doesn't
> means we are on the safe side of the fence though. I still have to have a
> better understanding on this.
>
> > The patch doesn't deal with prepared xact yet. Testing
> > "start->demote->promote" raise an assert if some prepared xact exist. I
> > suppose I will rollback them during demote in next patch version.
>
> Rollback all prepared transaction on demote seems easy. However, I realized
> there's no point to cancel them. After the demote action, they might still be
> committed later on a promoted instance.
>
> I am currently trying to clean shared memory for existing prepared transaction
> so they are handled by the startup process during recovery.
> I've been able to clean TwoPhaseState and the ProcArray. I'm now in the
> process to clean remaining prepared xact locks.
>
> Regards,
>
>

--
Jehan-Guillaume de Rorthais
Dalibo

Attachment Content-Type Size
v3-0001-Support-demoting-instance-from-production-to-standby.patch text/x-patch 54.8 KB
v3-0002-Add-various-tests-related-to-demote-and-promote-acti.patch text/x-patch 5.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-13 15:08:09 Re: GSSENC'ed connection stalls while reconnection attempts.
Previous Message Tomas Vondra 2020-07-13 14:58:50 Re: WIP: BRIN multi-range indexes