Re: why pg_walfile_name() cannot be executed during recovery?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why pg_walfile_name() cannot be executed during recovery?
Date: 2022-04-08 13:31:34
Message-ID: CALj2ACXiYAn=QK=i+nLT25M=EC14jYEOeNRxMyDbUxJo41T1eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 9:07 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Apr 7, 2022 at 9:32 AM Bharath Rupireddy
> <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
> > I spent some time today to allow pg_walfile_{name, name_offset} run in
> > recovery. Timeline ID is computed while in recovery as follows - WAL
> > receiver's last received and flushed WAL record's TLI if it's
> > streaming, otherwise the last replayed WAL record's TLI. This way,
> > these functions can be used on standby or PITR server or even in crash
> > recovery if the server opens up for read-only connections.
>
> I don't think this is a good definition. Suppose I ask for
> pg_walfile_name() using an older LSN. With this approach, we're going
> to get a filename based on the idea that the TLI that was in effect
> back then is the same one as the TLI that is in effect now, which
> might not be true. For example, suppose that the current TLI is 2 and
> it branched off of timeline 1 at 10/0. If I ask for
> pg_walfile_name('F/0'), it's going to give me the name of a WAL file
> that has never existed. That seems bad.
>
> It's also worth noting that there's a bit of a definitional problem
> here. If in the same situation, I ask for pg_walfile_name('11/0'),
> it's going to give me a filename based on TLI 2, but there's also a
> WAL file for that LSN with TLI 1. How do we know which one the user
> wants? Perhaps one idea would be to say that the relevant TLI is the
> one which was in effect at the time that LSN was replayed. If we do
> that, what about future LSNs? We could assume that for future LSNs,
> the TLI should be the same as the current TLI, but maybe that's also
> misleading, because recovery_target_timeline could be set.

Fundamental question - should the pg_walfile_{name, name_offset} check
whether the file with the computed WAL file name exists on the server
right now or ever existed earlier? Right now, they don't do that, see
[1].

I think we can make the functions more robust:
pg_walfile_{name, name_offset}(lsn, check_if_file_exists = false, tli
= invalid_timelineid) - when check_if_file_exists is true checks for
the computed WAL file existence and when a valid tli is provided uses
it in computing the WAL file name. When tli isn't provided, it
continues to use insert tli for primary, and in recovery it uses tli
as proposed in my patch. Perhaps, it can also do (as Michael
suggested) this - if check_if_file_exists is true and tli isn't
provided and there's timeline history, then it can go look at all the
timelines and whether the file exists with the computed name with
history tli.

> I think it's really important to start by being precise about the
> question that we think pg_walfile_name() ought to be answering. If we
> don't know that, then we really can't say what TLI it should be using.
> It's not hard to make the function return SOME answer using SOME TLI,
> but then it's not clear that the answer is the right one for any
> particular purpose. And in that case the function is more dangerous
> than useful, because people will write code that uses it to do stuff,
> and then that stuff won't actually work correctly under all
> circumstances.

Yes, once we agree on the semantics of these functions, having better
documentation will help.

Thoughts?

[1]
postgres=# select * from pg_walfile_name('50000/dfdf');
pg_walfile_name
--------------------------
000000010005000000000000
(1 row)
postgres=# select * from pg_walfile_name_offset('50000/dfdf');
file_name | file_offset
--------------------------+-------------
000000010005000000000000 | 57311
(1 row)

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2022-04-08 13:33:14 Re: Support for grabbing multiple consecutive values with nextval()
Previous Message Robert Haas 2022-04-08 13:17:40 Re: Lowering the ever-growing heap->pd_lower