Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Andres Freund <andres(at)anarazel(dot)de>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2022-10-29 18:39:05
Message-ID: 20221029183905.vs7rznh2q3eqxdut@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-10-29 18:33:53 +0530, Bharath Rupireddy wrote:
> On Sat, Oct 29, 2022 at 8:24 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > I'm working to extract independently useful bits from my AIO work, to reduce
> > the size of that patchset. This is one of those pieces.
>
> Thanks a lot for this great work. There are 12 patches in this thread,
> I believe each of these patches is trying to solve separate problems
> and can be reviewed and get committed separately, am I correct?

Mostly, yes.

For 0001 I already started
https://www.postgresql.org/message-id/20221027165914.2hofzp4cvutj6gin%40awork3.anarazel.de
to discuss the specific issue.

We don't strictly need v1-0002-aio-Add-some-error-checking-around-pinning.patch
but I did find it useful.

v1-0012-bufmgr-debug-Add-PrintBuffer-Desc.patch is not used in the patch
series, but I found it quite useful when debugging issues with the patch. A
heck of a lot easier to interpret page flags when they can be printed.

I also think there's some architectural questions that'll influence the number
of patches. E.g. I'm not convinced
v1-0010-heapam-Add-num_pages-to-RelationGetBufferForTuple.patch is quite the
right spot to track which additional pages should be used. It could very well
instead be alongside ->smgr_targblock. Possibly the best path would instead
be to return the additional pages explicitly to callers of
RelationGetBufferForTuple, but RelationGetBufferForTuple does a bunch of work
around pinning that potentially would need to be repeated in heap_multi_insert().

> > In workloads that extend relations a lot, we end up being extremely contended
> > on the relation extension lock. We've attempted to address that to some degree
> > by using batching, which helps, but only so much.
>
> Yes, I too have observed this in the past for parallel inserts in CTAS
> work - https://www.postgresql.org/message-id/CALj2ACW9BUoFqWkmTSeHjFD-W7_00s3orqSvtvUk%2BKD2H7ZmRg%40mail.gmail.com.
> Tackling bulk relation extension problems will unblock the parallel
> inserts (in CTAS, COPY) work I believe.

Yea. There's a lot of places the current approach ended up being a bottleneck.

> Firstly, 0001 avoids extra loop over waiters and looks a reasonable
> change, some comments on the patch:

> 1)
> + int lwWaiting; /* 0 if not waiting, 1 if on
> waitlist, 2 if
> + * waiting to be woken */
> Use macros instead of hard-coded values for better readability?
>
> #define PROC_LW_LOCK_NOT_WAITING 0
> #define PROC_LW_LOCK_ON_WAITLIST 1
> #define PROC_LW_LOCK_WAITING_TO_BE_WOKEN 2

Yea - this was really more of a prototype patch - I noted that we'd want to
use defines for this in
https://www.postgresql.org/message-id/20221027165914.2hofzp4cvutj6gin%40awork3.anarazel.de

> 3)
> + proclist_delete(&lock->waiters, MyProc->pgprocno, lwWaitLink);
> + found = true;
> I guess 'found' is a bit meaningless here as we are doing away with
> the proclist_foreach_modify loop. We can directly use
> MyProc->lwWaiting == 1 and remove 'found'.

We can rename it, but I think we still do need it, it's easier to analyze the
logic if the relevant check happens on a value from while we held the wait
list lock. Probably should do the reset inside the locked section as well.

> 4)
> if (!MyProc->lwWaiting)
> if (!proc->lwWaiting)
> Can we modify the above conditions in lwlock.c to MyProc->lwWaiting !=
> 1 or PROC_LW_LOCK_ON_WAITLIST or the macro?

I think it's better to check it's 0, rather than just != 1.

> 5) Is there any specific test case that I can see benefit of this
> patch? If yes, can you please share it here?

Yep, see the other thread, there's a pretty easy case there. You can also see
it at extreme client counts with a pgbench -S against a cluster with a smaller
shared_buffers. But the difference is not huge before something like 2048-4096
clients, and then it only occurs occasionally (because you need to end up with
most connections waiting for one of the partitions). So the test case from the
other thread is a lot better.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-10-29 20:00:25 resowner "cold start" overhead
Previous Message Justin Pryzby 2022-10-29 17:40:53 Re: warn if GUC set to an invalid shared library