Re: refactoring relation extension and BufferAlloc(), faster COPY

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: refactoring relation extension and BufferAlloc(), faster COPY
Date: 2023-03-01 09:12:35
Message-ID: 56577df0-f36d-20ee-301a-5d022980d619@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27/02/2023 23:45, Andres Freund wrote:
> On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote:
>> We also do this in freespace.c and visibilitymap.c:
>>
>> /* Extend as needed. */
>> while (fsm_nblocks_now < fsm_nblocks)
>> {
>> PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
>>
>> smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
>> pg.data, false);
>> fsm_nblocks_now++;
>> }
>>
>> We could use the new smgrzeroextend function here. But it would be better to
>> go through the buffer cache, because after this, the last block, at
>> 'fsm_nblocks', will be read with ReadBuffer() and modified.
>
> I doubt it's a particularly crucial thing to optimize.

Yeah, it won't make any practical difference to performance. I'm more
thinking if we can make this more consistent with other places where we
extend a relation.

> But, uh, isn't this code racy? Because this doesn't go through shared buffers,
> there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
> that writing pages isn't atomic vs readers. So another connection could
> connection could see the new relation size, but a read might return a
> partially written state of the page. Which then would cause checksum
> failures. And even worse, I think it could lead to loosing a write, if the
> concurrent connection writes out a page.

fsm_readbuf and vm_readbuf check the relation size first, with
smgrnblocks(), before trying to read the page. So to have a problem, the
smgrnblocks() would have to already return the new size, but the
smgrread() would not return the new contents. I don't think that's
possible, but not sure.

>> We could use BulkExtendSharedRelationBuffered() to extend the relation and
>> keep the last page locked, but the BulkExtendSharedRelationBuffered()
>> signature doesn't allow that. It can return the first N pages locked, but
>> there's no way to return the *last* page locked.
>
> We can't rely on bulk extending a, potentially, large number of pages in one
> go anyway (since we might not be allowed to pin that many pages). So I don't
> think requiring locking the last page is a really viable API.
>
> I think for this case I'd just just use the ExtendRelationTo() API we were
> discussing nearby. Compared to the cost of reducing syscalls / filesystem
> overhead to extend the relation, the cost of the buffer mapping lookup does't
> seem significant. That's different in e.g. the hio.c case, because there we
> need a buffer with free space, and concurrent activity could otherwise fill up
> the buffer before we can lock it again.

Works for me.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2023-03-01 09:21:11 RE: Time delayed LR (WAS Re: logical replication restrictions)
Previous Message Peter Eisentraut 2023-03-01 09:12:18 documentation updates for SQL:2023