Re: pgsql: Add contrib/pg_walinspect.

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Noah Misch <noah(at)leadboat(dot)com>, Jeff Davis <jdavis(at)postgresql(dot)org>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pgsql: Add contrib/pg_walinspect.
Date: 2022-04-27 08:17:11
Message-ID: CALj2ACW7_d2RZzq-K8ZLv+uWqrstpW9omo2Lc-jbD2K5XoFMGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Wed, Apr 27, 2022 at 8:57 AM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Wed, Apr 27, 2022 at 8:45 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> > I wrote:
> > > Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > >> BTW If you had your local change from debug.patch (upthread), that'd
> > >> defeat the patch. I mean this:
> >
> > >> + if(!*errormsg)
> > >> + *errormsg = "decode_queue_head is null";
> >
> > > Oh! Okay, I'll retry without that.
> >
> > I've now done several runs with your patch and not seen the test failure.
> > However, I think we ought to rethink this API a bit rather than just
> > apply the patch as-is. Even if it were documented, relying on
> > errormsg = NULL to mean something doesn't seem like a great plan.
>
> Sorry for being late in the game, occupied with other stuff.
>
> How about using private_data of XLogReaderState for
> read_local_xlog_page_no_wait, something like this?
>
> typedef struct ReadLocalXLogPageNoWaitPrivate
> {
> bool end_of_wal;
> } ReadLocalXLogPageNoWaitPrivate;
>
> In read_local_xlog_page_no_wait:
>
> /* If asked, let's not wait for future WAL. */
> if (!wait_for_wal)
> {
> private_data->end_of_wal = true;
> break;
> }
>
> /*
> * Opaque data for callbacks to use. Not used by XLogReader.
> */
> void *private_data;

I found an easy way to reproduce this consistently (I think on any server):

I basically generated huge WAL record (I used a fun extension that I
wrote - https://github.com/BRupireddy/pg_synthesize_wal, but one can
use pg_logical_emit_message as well)
session 1:
select * from pg_synthesize_wal_record(1*1024*1024); --> generate 1 MB
of WAL record first and make a note of the output lsn (lsn1)

session 2:
select * from pg_get_wal_records_info_till_end_of_wal(lsn1);
\watch 1

session 1:
select * from pg_synthesize_wal_record(1000*1024*1024); --> generate
~1 GB of WAL record and we see ERROR: could not read WAL at XXXXX in
session 2.

Delay the checkpoint (set checkpoint_timeout to 1hr) just not recycle
the wal files while we run pg_walinspect functions, no other changes
required from the default initdb settings on the server.

And, Thomas's patch fixes the issue.

Regards,
Bharath Rupireddy.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-04-27 10:22:06 Re: pgsql: Add contrib/pg_walinspect.
Previous Message Peter Eisentraut 2022-04-27 07:55:48 pgsql: Fix incorrect format placeholders

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-04-27 09:08:22 Re: Wrong rows count in EXPLAIN
Previous Message Michael Paquier 2022-04-27 08:13:12 Re: BUG #17448: In Windows 10, version 1703 and later, huge_pages doesn't work.