Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Jim Nasby <nasbyj(at)amazon(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org>, "Thomas Munro" <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2023-02-21 21:00:15
Message-ID: a1efd2f9-290e-b860-b490-a8bf6530b288@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/28/22 9:54 PM, Andres Freund wrote:
> b) I found that is quite beneficial to bulk-extend the relation with
> smgrextend() even without concurrency. The reason for that is the primarily
> the aforementioned dirty buffers that our current extension method causes.
>
> One bit that stumped me for quite a while is to know how much to extend the
> relation by. RelationGetBufferForTuple() drives the decision whether / how
> much to bulk extend purely on the contention on the extension lock, which
> obviously does not work for non-concurrent workloads.
>
> After quite a while I figured out that we actually have good information on
> how much to extend by, at least for COPY /
> heap_multi_insert(). heap_multi_insert() can compute how much space is
> needed to store all tuples, and pass that on to
> RelationGetBufferForTuple().
>
> For that to be accurate we need to recompute that number whenever we use an
> already partially filled page. That's not great, but doesn't appear to be a
> measurable overhead.
Some food for thought: I think it's also completely fine to extend any
relation over a certain size by multiple blocks, regardless of
concurrency. E.g. 10 extra blocks on an 80MB relation is 0.1%. I don't
have a good feel for what algorithm would make sense here; maybe
something along the lines of extend = max(relpages / 2048, 128); if
extend < 8 extend = 1; (presumably extending by just a couple extra
pages doesn't help much without concurrency).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-21 21:12:56 Re: refactoring relation extension and BufferAlloc(), faster COPY
Previous Message Heikki Linnakangas 2023-02-21 20:35:55 Re: [PATCH] Fix unbounded authentication exchanges during PQconnectPoll()