Re: ThisTimeLineID can be used uninitialized

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: amul sul <sulamul(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ThisTimeLineID can be used uninitialized
Date: 2021-10-19 20:43:59
Message-ID: 20211019204359.aakuvtk7tjari6to@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-10-19 15:13:04 -0400, Robert Haas wrote:
> This is a followup to
> http://postgr.es/m/CA+TgmoZ5A26C6OxKApafyuy_sx0VG6VXdD_Q6aSEzsvrPHDwzw@mail.gmail.com.
> I'm suspicious of the following code in CreateReplicationSlot:
>
> /* setup state for WalSndSegmentOpen */
> sendTimeLineIsHistoric = false;
> sendTimeLine = ThisTimeLineID;
>
> The first thing that's odd about this is that if this is physical
> replication, it's apparently dead code, because AFAICT sendTimeLine
> will not be used for anything in that case.

It's quite confusing. It's *really* not helped by physical replication using
but not really using an xlogreader to keep state. Which presumably isn't
actually used during a physical CreateReplicationSlot(), but is referenced by
a comment :/

> But I don't know if it matters. We call CreateInitDecodingContext()
> with sendTimeLine and ThisTimeLineID still zero; it doesn't call any
> callbacks. Then we call DecodingContextFindStartpoint() with
> sendTimeLine still 0 and the first callback that gets invoked is
> logical_read_xlog_page. At this point sendTimeLine = 0 and
> ThisTimeLineID = 0. That calls XLogReadDetermineTimeline() which
> resets ThisTimeLineID to the correct value of 2, but when we get back
> to logical_read_xlog_page, we still manage to call WALRead with a
> timeline of 0 because state->seg.ws_tli is still 0. And when WALRead
> eventually does call WalSndOpen, which unconditionally propagates
> sendTimeLine into the TLI pointer that is passed to it. So now
> state->seg_ws_tli also ends up being 2. So I guess maybe nothing bad
> happens? But it sure seems strange that the code would apparently work
> just as well as it does today with the following patch:
>
> diff --git a/src/backend/replication/walsender.c
> b/src/backend/replication/walsender.c
> index b811a5c0ef..44fd598519 100644
> --- a/src/backend/replication/walsender.c
> +++ b/src/backend/replication/walsender.c
> @@ -945,7 +945,7 @@ CreateReplicationSlot(CreateReplicationSlotCmd *cmd)
>
> /* setup state for WalSndSegmentOpen */
> sendTimeLineIsHistoric = false;
> - sendTimeLine = ThisTimeLineID;
> + sendTimeLine = rand() % 10;
>
> if (cmd->kind == REPLICATION_KIND_PHYSICAL)
> {

Istm we should introduce an InvalidTimeLineID, and explicitly initialize
sendTimeLine to that, and assert that it's valid / invalid in a bunch of
places?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-10-19 20:44:35 Re: ALTER INDEX .. RENAME allows to rename tables/views as well
Previous Message Alvaro Herrera 2021-10-19 20:36:04 Re: ALTER INDEX .. RENAME allows to rename tables/views as well