Quick Links

Re: refactoring relation extension and BufferAlloc(), faster COPY

From:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: refactoring relation extension and BufferAlloc(), faster COPY
Date:	2023-03-01 09:12:35
Message-ID:	56577df0-f36d-20ee-301a-5d022980d619@iki.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 27/02/2023 23:45, Andres Freund wrote:
> On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote:
>> We also do this in freespace.c and visibilitymap.c:
>>
>> /* Extend as needed. */
>> while (fsm_nblocks_now < fsm_nblocks)
>> {
>> PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
>>
>> smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
>> pg.data, false);
>> fsm_nblocks_now++;
>> }
>>
>> We could use the new smgrzeroextend function here. But it would be better to
>> go through the buffer cache, because after this, the last block, at
>> 'fsm_nblocks', will be read with ReadBuffer() and modified.
>
> I doubt it's a particularly crucial thing to optimize.

Yeah, it won't make any practical difference to performance. I'm more
thinking if we can make this more consistent with other places where we
extend a relation.

> But, uh, isn't this code racy? Because this doesn't go through shared buffers,
> there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
> that writing pages isn't atomic vs readers. So another connection could
> connection could see the new relation size, but a read might return a
> partially written state of the page. Which then would cause checksum
> failures. And even worse, I think it could lead to loosing a write, if the
> concurrent connection writes out a page.

fsm_readbuf and vm_readbuf check the relation size first, with
smgrnblocks(), before trying to read the page. So to have a problem, the
smgrnblocks() would have to already return the new size, but the
smgrread() would not return the new contents. I don't think that's
possible, but not sure.

>> We could use BulkExtendSharedRelationBuffered() to extend the relation and
>> keep the last page locked, but the BulkExtendSharedRelationBuffered()
>> signature doesn't allow that. It can return the first N pages locked, but
>> there's no way to return the *last* page locked.
>
> We can't rely on bulk extending a, potentially, large number of pages in one
> go anyway (since we might not be allowed to pin that many pages). So I don't
> think requiring locking the last page is a really viable API.
>
> I think for this case I'd just just use the ExtendRelationTo() API we were
> discussing nearby. Compared to the cost of reducing syscalls / filesystem
> overhead to extend the relation, the cost of the buffer mapping lookup does't
> seem significant. That's different in e.g. the hio.c case, because there we
> need a buffer with free space, and concurrent activity could otherwise fill up
> the buffer before we can lock it again.

Works for me.

- Heikki

In response to

Re: refactoring relation extension and BufferAlloc(), faster COPY at 2023-02-27 21:45:30 from Andres Freund

Responses

Re: refactoring relation extension and BufferAlloc(), faster COPY at 2023-03-01 17:02:00 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hayato Kuroda (Fujitsu)	2023-03-01 09:21:11	RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message	Peter Eisentraut	2023-03-01 09:12:18	documentation updates for SQL:2023