Re: ThisTimeLineID can be used uninitialized

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: amul sul <sulamul(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: ThisTimeLineID can be used uninitialized
Date: 2021-10-21 19:29:26
Message-ID: CA+TgmobSWzacEs+r6C-7DrOPDHoDar4i9gzxB3SCBr5qjnLmVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 19, 2021 at 4:44 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> It's quite confusing. It's *really* not helped by physical replication using
> but not really using an xlogreader to keep state. Which presumably isn't
> actually used during a physical CreateReplicationSlot(), but is referenced by
> a comment :/

I can't figure out what you're referring to here. I can't find
CreateReplicationSlot() using an xlogreader in the logical replication
case, or a comment that refers to it doing so.

> Istm we should introduce an InvalidTimeLineID, and explicitly initialize
> sendTimeLine to that, and assert that it's valid / invalid in a bunch of
> places?

I think the correct fix for this particular problem is just to delete
the initialization, as in the attached patch. I spent a long time
studying this today and eventually convinced myself that there's just
no way these initializations can ever do anything (details in proposed
commit message). While it is important that we do not access the
global variable when it's uninitialized, here there is no reason to
access it in the first place.

Regarding the more general problem, I think we should consider (1)
reducing the number of places that access ThisTimeLineID directly,
preferring to add TimeLineID arguments to functions and pass the
relevant timeline value around explicitly and then (2) changing all of
the remaining accesses to ThisTimeLineID to function calls instead,
e.g. by inventing a function GetCurrentTimeLineID(). Once we do that,
I think this kind of problem just goes away. On the one hand,
GetCurrentTimeLineID() could assert that the value is valid before
returning it, and then we would have centralized checking that we're
not using a bogus value. But, there's no reason to stop there. If all
the callers are using this function rather than accessing the global
variable directly, then the function can just initialize the value
from shared memory as required! Or it can forget about having a local
copy stored in a global variable and just always read the current
value from shared memory! With a little thought, I think this approach
can avoid this sort of unfortunate coding:

if (!RecoveryInProgress())
read_upto = GetFlushRecPtr();
else
read_upto = GetXLogReplayRecPtr(&ThisTimeLineID);
tli = ThisTimeLineID;

What is going on here? Well, if we're not still in recovery, then the
call to RecoveryInProgress() will initialize ThisTimeLineID as a side
effect, and after that it can't change. If we are still in recovery
then GetXLogReplayRecPtr()'s will update the global variable as a side
effect on every trip through the function. Either way, read_upto is
the end of WAL in the way that's relevant to whichever operating mode
is current. But imagine that we could code this in a way that didn't
depend on global variables getting updated as a side effect. For
example:

if (!RecoveryInProgress())
read_upto = GetFlushRecPtr();
else
read_upto = GetXLogReplayRecPtr();
currTLI = GetCurrentTimeLineID();

Or perhaps:

if (!RecoveryInProgress())
read_upto = GetFlushRecPtr(&currTLI);
else
read_upto = GetXLogReplayRecPtr(&currTLI);

My point here is that the current idiom only makes sense if you
realize that RecoveryInProgress() has a side effect of updating
ThisTimeLineID, and on the other hand that the only reason we're using
ThisTimeLineID instead of a local variable here is that that's what
RecoveryInProgress() updates. It's just two mutually-reinforcing bad
decisions.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Remove-useless-code-from-CreateReplicationSlot.patch application/octet-stream 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-10-21 19:51:47 Re: parallelizing the archiver
Previous Message Stephen Frost 2021-10-21 17:28:12 Re: XTS cipher mode for cluster file encryption