Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2023-03-01 22:35:15
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 2023-03-02 00:04:14 +0200, Heikki Linnakangas wrote:
> On 01/03/2023 09:33, Andres Freund wrote:
> > typedef enum ExtendBufferedFlags {
> > EB_SKIP_EXTENSION_LOCK = (1 << 0),
> > EB_IN_RECOVERY = (1 << 1),
> > EB_CREATE_FORK_IF_NEEDED = (1 << 2),
> > EB_LOCK_FIRST = (1 << 3),
> >
> > /* internal flags follow */
> > EB_RELEASE_PINS = (1 << 4),
> > } ExtendBufferedFlags;
> Is EB_IN_RECOVERY always set when RecoveryInProgress()? Is it really needed?

Right now it's just passed in from the caller. It's at the moment just needed
to know what to pass to smgrcreate(isRedo).

However, XLogReadBufferExtended() doesn't currently use this path, so maybe
it's not actually needed.

> What does EB_LOCK_FIRST do?

Lock the first returned buffer, this is basically the replacement for
num_locked_buffers from the earlier version. I think it's likely that either
locking the first, or potentially at some later point locking all buffers, is
all that's needed for ExtendBufferedRelBy().

EB_LOCK_FIRST_BUFFER would perhaps be better?

> > /*
> > * To identify the relation - either relation or smgr + relpersistence has to
> > * be specified. Used via the EB_REL()/EB_SMGR() macros below. This allows us
> > * to use the same function for both crash recovery and normal operation.
> > */
> > typedef struct ExtendBufferedWhat
> > {
> > Relation rel;
> > struct SMgrRelationData *smgr;
> > char relpersistence;
> > } ExtendBufferedWhat;
> >
> > #define EB_REL(p_rel) ((ExtendBufferedWhat){.rel = p_rel})
> > /* requires use of EB_SKIP_EXTENSION_LOCK */
> > #define EB_SMGR(p_smgr, p_relpersistence) ((ExtendBufferedWhat){.smgr = p_smgr, .relpersistence = p_relpersistence})
> Clever. I'm still not 100% convinced we need the EB_SMGR variant, but with
> this we'll have the flexibility in any case.

Hm - how would you use it from XLogReadBufferExtended() without that?

XLogReadBufferExtended() spends a disappointing amount of time in
smgropen(). Quite visible in profiles.

In the plan read case at least one in XLogReadBufferExtended()
itself, then in ReadBufferWithoutRelcache(). The extension path right now is
worse - it does one smgropen() for each extended block.

I think we should avoid using ReadBufferWithoutRelcache() in
XLogReadBufferExtended() in the read path as well, but that's for later.

> > extern Buffer ExtendBufferedRel(ExtendBufferedWhat eb,
> > ForkNumber forkNum,
> > BufferAccessStrategy strategy,
> > uint32 flags);
> > extern BlockNumber ExtendBufferedRelBy(ExtendBufferedWhat eb,
> > ForkNumber fork,
> > BufferAccessStrategy strategy,
> > uint32 flags,
> > uint32 extend_by,
> > Buffer *buffers,
> > uint32 *extended_by);
> > extern Buffer ExtendBufferedRelTo(ExtendBufferedWhat eb,
> > ForkNumber fork,
> > BufferAccessStrategy strategy,
> > uint32 flags,
> > BlockNumber extend_to,
> > ReadBufferMode mode);
> >
> > As you can see I removed ReadBufferMode from most of the functions (as
> > suggested by Heikki earlier). When extending by 1/multiple pages, we only need
> > to know whether to lock or not.
> Ok, that's better. Still complex and a lot of arguments, but I don't have
> any great suggestions on how to improve it.

I don't think there are going to be all that many callers of
ExtendBufferedRelBy() and ExtendBufferedRelTo(), most are going to be
ExtendBufferedRel(), I think. So the complexity seems acceptable.

> > The reason ExtendBufferedRelTo() has a 'mode' argument is that that allows to
> > fall back to reading page normally if there was a concurrent relation
> > extension.
> Hmm, I think you'll need another return value, to let the caller know if the
> relation was extended or not. Or a flag to ereport(ERROR) if the page
> already exists, for ginbuild() and friends.

I don't think ginbuild() et al need to use ExtendBufferedRelTo()? A plain
ExtendBufferedRel() should suffice. The places that do need it are
fsm_extend() and vm_extend() - I did end up avoiding the redundant lookup.

But I was wondering about a flag controlling this as well.

Attached is my current version of this. Still needs more polishing, including
comments explaining the flags. But I thought it'd be useful to have it out

There's two new patches in the series:
- a patch to not initialize pages in the loop in fsm_extend(), vm_extend()
anymore - we have a check about initializing pages at a later point, so
there doesn't really seem to be a need for it?
- a patch to use the new ExtendBufferedRelTo() in fsm_extend(), vm_extend()
and XLogReadBufferExtended()

In this version I also tries to address some of the other feedback raised in
the thread. One thing I haven't decided what to do about yet is David's
feedback about a version of PinLocalBuffer() that doesn't adjust the
usagecount, which wouldn't need to read the buf_state.


Andres Freund

Attachment Content-Type Size
v4-0001-Revise-pg_pwrite_zeros.patch text/x-diff 4.3 KB
v4-0002-Add-some-error-checking-around-pinning.patch text/x-diff 3.1 KB
v4-0003-hio-Release-extension-lock-before-initializing-pa.patch text/x-diff 2.0 KB
v4-0004-Add-smgrzeroextend-FileZero-FileFallocate.patch text/x-diff 12.1 KB
v4-0005-bufmgr-Add-Pin-UnpinLocalBuffer.patch text/x-diff 5.3 KB
v4-0006-bufmgr-Remove-buffer-write-dirty-tracepoints.patch text/x-diff 3.1 KB
v4-0007-bufmgr-Acquire-and-clean-victim-buffer-separately.patch text/x-diff 27.8 KB
v4-0008-bufmgr-Support-multiple-in-progress-IOs-by-using-.patch text/x-diff 12.4 KB
v4-0009-bufmgr-Move-relation-extension-handling-into-Exte.patch text/x-diff 42.5 KB
v4-0010-Convert-a-few-places-to-ExtendBufferedRel.patch text/x-diff 12.9 KB
v4-0011-heapam-Add-num_pages-to-RelationGetBufferForTuple.patch text/x-diff 6.8 KB
v4-0012-hio-Use-ExtendBufferedRelBy.patch text/x-diff 10.5 KB
v4-0013-bufmgr-debug-Add-PrintBuffer-Desc.patch text/x-diff 3.1 KB
v4-0014-WIP-Don-t-initialize-page-in-vm-fsm-_extend-not-n.patch text/x-diff 2.1 KB
v4-0015-Convert-a-few-places-to-ExtendBufferedRelTo.patch text/x-diff 11.6 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-03-01 22:47:51 Re: stopgap fix for signal handling during restore_command
Previous Message Justin Pryzby 2023-03-01 22:32:14 Re: cataloguing NOT NULL constraints