Re: ThisTimeLineID is used uninitialized in basebackup.c, too

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: ThisTimeLineID is used uninitialized in basebackup.c, too
Date: 2021-10-29 01:26:29
Message-ID: 20211029.102629.1643586045850773206.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 28 Oct 2021 11:39:52 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> I previously reported[1] that CreateReplicationSlot() was accessing
> the global variable ThisTimeLineID when it might not be initialized.
> Today I realized that the code in basebackup.c has the same problem.
> perform_base_backup() blithely uses the variable six times without
> doing anything at all to initialize it. In practice, it's always going
> to be initialized to some non-zero value, because pg_basebackup is
> always going to execute IDENTIFY_SYSTEM before it executes
> BASE_BACKUP, and that's going to set ThisTimeLineID as a side effect.
> But, if you hack pg_basebackup to not call IDENTIFY_SYSTEM before
> calling BASE_BACKUP, then you can reach this code with ThisTimeLineID
> == 0 using pg_basebackup -Xfetch. Even if you don't do that, you're
> only guaranteed that ThisTimeLineID is initialized to something, not
> that it's initialized to the correct thing. The timeline on the
> standby can change any time.
>
> Fortunately, this has no serious consequences AFAICT. Heikki wrote
> "I'd rather not worry about timelines here" and wrote the algorithm in
> such a way that the only thing that happens if you feed a wrong TLI
> through the logic is that the wrong TLI might be used to construct a
> WAL file name for use in an error report. You won't get an error when
> things should have worked, or the other way around, but if you get a
> complaint about a segment file, it might include the wrong TLI in the
> filename. I think possibly this logic should be rewritten to be fully
> timeline-aware, but that's a bit of a project and this isn't really
> broken, so what I'd like to do for right now is change it to use
> non-bogus TLIs that we have available in local variables rather than a
> possibly-nonsensical value from a global variable. Patch for that
> attached.

The first hunk is useless but seems good for sanity. The second hunk
looks good. (I think we can call CheckXLogRemoved before collecting
file names but it would be another thing.) The third and fifth hunks
look fine.

The fourth hunk is somewhat dubious. The TLI for a new segment is not
necessarily equal to the skipped segments unless starttli ==
endtli. So we could change the message like "could not find WAL file
for segment %X" or such but also I don't oppose to using the tli of
the new segment as an approximate.

As a whole, it looks fine to me.

> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
> [1] http://postgr.es/m/CA+TgmoZC2aYe9eRXcUa_c_XBBMVLps9LZL_K5Oo7M9Xhco2KhA@mail.gmail.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-10-29 01:54:31 Re: TAP test for recovery_end_command
Previous Message Masahiko Sawada 2021-10-29 00:47:27 Re: Skipping logical replication transactions on subscriber side