Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2023-02-22 20:31:52
Message-ID: 20230222203152.rh4s75aedj65hyjn@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-02-22 11:18:57 +0200, Heikki Linnakangas wrote:
> On 21/02/2023 21:22, Andres Freund wrote:
> > On 2023-02-21 18:18:02 +0200, Heikki Linnakangas wrote:
> > > Is it ever possible to call this without a relcache entry? WAL redo
> > > functions do that with ReadBuffer, but they only extend a relation
> > > implicitly, by replay a record for a particular block.
> >
> > I think we should use it for crash recovery as well, but the patch doesn't
> > yet. We have some gnarly code there, see the loop using P_NEW in
> > XLogReadBufferExtended(). Extending the file one-by-one is a lot more
> > expensive than doing it in bulk.
>
> Hmm, XLogReadBufferExtended() could use smgrzeroextend() to fill the gap,
> and then call ExtendRelationBuffered for the target page. Or the new
> ExtendRelationBufferedTo() function that you mentioned.

I don't think it's safe to just use smgrzeroextend(). Without the page-level
interlock from the buffer entry, a concurrent reader can read/write the
extended portion of the relation, while we're extending. That can lead to
loosing writes.

It also turns out that just doing smgrzeroextend(), without filling s_b, is
often bad for performance, because it may cause reads when trying to fill the
buffers. Although hopefully that's less of an issue during WAL replay, due to
REGBUF_WILL_INIT.

> In the common case that you load a lot of data to a relation extending it,
> and then crash, the WAL replay would still extend the relation one page at a
> time, which is inefficient. Changing that would need bigger changes, to
> WAL-log the relation extension as a separate WAL record, for example. I
> don't think we need to solve that right now, it can be addressed separately
> later.

Yea, that seems indeed something for later.

There's several things we could do without adding WAL logging of relation
extension themselves.

One relatively easy thing would be to add information about the number of
blocks we're extending by to XLOG_HEAP2_MULTI_INSERT records. Compared to the
insertions themselves that'd barely be noticable.

A slightly more complicated thing would be to peek ahead in the WAL (we have
infrastructure for that now) and extend by enough for the next few relation
extensions.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-22 20:33:45 Re: pg_regress: Treat child process failure as test failure
Previous Message Joe Conway 2023-02-22 20:25:44 Re: Non-superuser subscription owners