Quick Links

Re: refactoring relation extension and BufferAlloc(), faster COPY

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Jim Nasby <nasbyj(at)amazon(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: refactoring relation extension and BufferAlloc(), faster COPY
Date:	2023-02-21 21:12:56
Message-ID:	20230221211256.ibxhwtdrvytr2hpt@awork3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2023-02-21 15:00:15 -0600, Jim Nasby wrote:
> On 10/28/22 9:54 PM, Andres Freund wrote:
> > b) I found that is quite beneficial to bulk-extend the relation with
> > smgrextend() even without concurrency. The reason for that is the primarily
> > the aforementioned dirty buffers that our current extension method causes.
> >
> > One bit that stumped me for quite a while is to know how much to extend the
> > relation by. RelationGetBufferForTuple() drives the decision whether / how
> > much to bulk extend purely on the contention on the extension lock, which
> > obviously does not work for non-concurrent workloads.
> >
> > After quite a while I figured out that we actually have good information on
> > how much to extend by, at least for COPY /
> > heap_multi_insert(). heap_multi_insert() can compute how much space is
> > needed to store all tuples, and pass that on to
> > RelationGetBufferForTuple().
> >
> > For that to be accurate we need to recompute that number whenever we use an
> > already partially filled page. That's not great, but doesn't appear to be a
> > measurable overhead.
> Some food for thought: I think it's also completely fine to extend any
> relation over a certain size by multiple blocks, regardless of concurrency.
> E.g. 10 extra blocks on an 80MB relation is 0.1%. I don't have a good feel
> for what algorithm would make sense here; maybe something along the lines of
> extend = max(relpages / 2048, 128); if extend < 8 extend = 1; (presumably
> extending by just a couple extra pages doesn't help much without
> concurrency).

I previously implemented just that. It's not easy to get right. You can easily
end up with several backends each extending the relation by quite a bit, at
the same time (or you re-introduce contention). Which can end up with a
relation being larger by a bunch if data loading stops at some point.

We might want that as well at some point, but the approach implemented in the
patchset is precise and thus always a win, and thus should be the baseline.

Greetings,

Andres Freund

In response to

Re: refactoring relation extension and BufferAlloc(), faster COPY at 2023-02-21 21:00:15 from Jim Nasby

Responses

Re: refactoring relation extension and BufferAlloc(), faster COPY at 2023-02-21 22:49:37 from Jim Nasby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Luzanov	2023-02-21 21:14:34	Re: psql: Add role's membership options to the \du+ command
Previous Message	Jim Nasby	2023-02-21 21:00:15	Re: refactoring relation extension and BufferAlloc(), faster COPY