Re: Minimal logical decoding on standbys

From: "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, fabriziomello(at)gmail(dot)com, tushar <tushar(dot)ahuja(at)enterprisedb(dot)com>, Rahila Syed <rahila(dot)syed(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Minimal logical decoding on standbys
Date: 2023-01-23 11:03:35
Message-ID: 9e978c6c-0a6e-9271-1203-800c17d91d10@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 1/19/23 10:43 AM, Drouvot, Bertrand wrote:
> Hi,
>
> On 1/19/23 3:46 AM, Andres Freund wrote:
>> Hi,
>>
>> I mean a logical walsender that starts on a standby and continues across
>> promotion of the standby.
>>
>
> Got it, thanks, will do.
>

While working on it, I noticed that with V41 a:

pg_recvlogical -S active_slot -P test_decoding -d postgres -f - --start

on the standby is getting:

pg_recvlogical: error: unexpected termination of replication stream: ERROR: could not find record while sending logically-decoded data: invalid record length at 0/311C438: wanted 24, got 0
pg_recvlogical: disconnected; waiting 5 seconds to try again

when the standby gets promoted (the logical decoding is able to resume correctly after the error though).

This is fixed in V42 attached (no error anymore and logical decoding through the walsender works correctly after the promotion).

The fix is in 0003 where in logical_read_xlog_page() (as compare to V41):

- We now check if RecoveryInProgress() (instead of relying on am_cascading_walsender) to check if the standby got promoted
- Based on this, the currTLI is being retrieved with GetXLogReplayRecPtr() or GetWALInsertionTimeLine() (so, with GetWALInsertionTimeLine() after promotion)
- This currTLI is being used as an argument in WALRead() (instead of state->seg.ws_tli, which anyhow sounds weird as being
compared with itself that way "tli != state->seg.ws_tli" in WALRead()). That way WALRead() discovers that the timeline changed and then opens the right WAL file.

Please find V42 attached.

I'll resume working on the TAP tests comments.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v42-0006-Doc-changes-describing-details-about-logical-dec.patch text/plain 2.1 KB
v42-0005-New-TAP-test-for-logical-decoding-on-standby.patch text/plain 20.4 KB
v42-0004-Fixing-Walsender-corner-case-with-logical-decodi.patch text/plain 7.5 KB
v42-0003-Allow-logical-decoding-on-standby.patch text/plain 11.7 KB
v42-0002-Handle-logical-slot-conflicts-on-standby.patch text/plain 32.4 KB
v42-0001-Add-info-in-WAL-records-in-preparation-for-logic.patch text/plain 72.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-01-23 11:08:15 Re: heapgettup refactoring
Previous Message Andrew Dunstan 2023-01-23 10:56:20 Re: run pgindent on a regular basis / scripted manner