Re: WIP: WAL prefetch (another approach)

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Stephen Frost <sfrost(at)snowman(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, David Steele <david(at)pgmasters(dot)net>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: WAL prefetch (another approach)
Date: 2022-03-11 08:27:37
Message-ID: 20220311082737.bmnlmtzwi2xqrhxl@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 11, 2022 at 06:31:13PM +1300, Thomas Munro wrote:
> On Wed, Mar 9, 2022 at 7:47 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > This could use XLogRecGetBlock? Note that this macro is for now never used.
> > xlogreader.c also has some similar forgotten code that could use
> > XLogRecMaxBlockId.
>
> That is true, but I was thinking of it like this: most of the existing
> code that interacts with xlogreader.c is working with the old model,
> where the XLogReader object holds only one "current" record. For that
> reason the XLogRecXXX() macros continue to work as before, implicitly
> referring to the record that XLogReadRecord() most recently returned.
> For xlogreader.c code, I prefer not to use the XLogRecXXX() macros,
> even when referring to the "current" record, since xlogreader.c has
> switched to a new multi-record model. In other words, they're sort of
> 'old API' accessors provided for continuity. Does this make sense?

Ah I see, it does make sense. I'm wondering if there should be some comment
somewhere on the top of the file to mention it, as otherwise someone may be
tempted to change it to avoid some record->record->xxx usage.

> > +DecodedXLogRecord *
> > +XLogNextRecord(XLogReaderState *state, char **errormsg)
> > +{
> > [...]
> > + /*
> > + * state->EndRecPtr is expected to have been set by the last call to
> > + * XLogBeginRead() or XLogNextRecord(), and is the location of the
> > + * error.
> > + */
> > +
> > + return NULL;
> >
> > The comment should refer to XLogFindNextRecord, not XLogNextRecord?
>
> No, it does mean to refer to the XLogNextRecord() (ie the last time
> you called XLogNextRecord and successfully dequeued a record, we put
> its end LSN there, so if there is a deferred error, that's the
> corresponding LSN). Make sense?

It does, thanks!

>
> > Also, is it worth an assert (likely at the top of the function) for that?
>
> How could I assert that EndRecPtr has the right value?

Sorry, I meant to assert that some value was assigned (!XLogRecPtrIsInvalid).
It can only make sure that the first call is done after XLogBeginRead /
XLogFindNextRecord, but that's better than nothing and consistent with the top
comment.

> > + if (unlikely(state->decode_buffer == NULL))
> > + {
> > + if (state->decode_buffer_size == 0)
> > + state->decode_buffer_size = DEFAULT_DECODE_BUFFER_SIZE;
> > + state->decode_buffer = palloc(state->decode_buffer_size);
> > + state->decode_buffer_head = state->decode_buffer;
> > + state->decode_buffer_tail = state->decode_buffer;
> > + state->free_decode_buffer = true;
> > + }
> >
> > Maybe change XLogReaderSetDecodeBuffer to also handle allocation and use it
> > here too? Otherwise XLogReaderSetDecodeBuffer should probably go in 0002 as
> > the only caller is the recovery prefetching.
>
> I don't think it matters much?

The thing is that for now the only caller to XLogReaderSetDecodeBuffer (in
0002) only uses it to set the length, so a buffer is actually never passed to
that function. Since frontend code can rely on a palloc emulation, is there
really a use case to use e.g. some stack buffer there, or something in a
specific memory context? It seems to be the only use cases for having
XLogReaderSetDecodeBuffer() rather than simply a
XLogReaderSetDecodeBufferSize(). But overall I agree it doesn't matter much,
so no objection to keep it as-is.

> > It's also not particulary obvious why XLogFindNextRecord() doesn't check for
> > this value. AFAICS callers don't (and should never) call it with a
> > nonblocking == true state, maybe add an assert for that?
>
> Fair point. I have now explicitly cleared that flag. (I don't much
> like state->nonblocking, which might be better as an argument to
> page_read(), but in fact I don't like the fact that page_read
> callbacks are blocking in the first place, which is why I liked
> Horiguchi-san's patch to get rid of that... but that can be a subject
> for later work.)

Agreed.

> > static void
> > ResetDecoder(XLogReaderState *state)
> > {
> > [...]
> > + /* Reset the decoded record queue, freeing any oversized records. */
> > + while ((r = state->decode_queue_tail))
> >
> > nit: I think it's better to explicitly check for the assignment being != NULL,
> > and existing code is more frequently written this way AFAICS.
>
> I think it's perfectly normal idiomatic C, but if you think it's
> clearer that way, OK, done like that.

The thing I don't like about this form is that you can never be sure that an
assignment was really meant unless you read the rest of the nearby code. Other
than that agreed, if perfectly normal idiomatic C.

> I realised that this version has broken -DWAL_DEBUG. I'll fix that
> shortly, but I wanted to post this update ASAP, so here's a new
> version.

+ * Returns XLREAD_WOULDBLOCK if he requested data can't be read without
+ * waiting. This can be returned only if the installed page_read callback

typo: "the" requested data.

Other than that it all looks good to me!

> The other thing I need to change is that I should turn on
> recovery_prefetch for platforms that support it (ie Linux and maybe
> NetBSD only for now), in the tests. Right now you need to put
> recovery_prefetch=on in a file and then run the tests with
> "TEMP_CONFIG=path_to_that make -C src/test/recovery check" to
> excercise much of 0002.

+1 with Andres' idea to have a "try" setting.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yura Sokolov 2022-03-11 08:30:27 Re: BufferAlloc: don't take two simultaneous locks
Previous Message Kyotaro Horiguchi 2022-03-11 08:21:37 Re: BufferAlloc: don't take two simultaneous locks