From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Date: | 2023-02-27 21:45:30 |
Message-ID: | 20230227214530.vfe4gq5oipmsa3zu@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2023-02-27 18:06:22 +0200, Heikki Linnakangas wrote:
> We also do this in freespace.c and visibilitymap.c:
>
> /* Extend as needed. */
> while (fsm_nblocks_now < fsm_nblocks)
> {
> PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
>
> smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
> pg.data, false);
> fsm_nblocks_now++;
> }
>
> We could use the new smgrzeroextend function here. But it would be better to
> go through the buffer cache, because after this, the last block, at
> 'fsm_nblocks', will be read with ReadBuffer() and modified.
I doubt it's a particularly crucial thing to optimize.
But, uh, isn't this code racy? Because this doesn't go through shared buffers,
there's no IO_IN_PROGRESS interlocking against a concurrent reader. We know
that writing pages isn't atomic vs readers. So another connection could
connection could see the new relation size, but a read might return a
partially written state of the page. Which then would cause checksum
failures. And even worse, I think it could lead to loosing a write, if the
concurrent connection writes out a page.
> We could use BulkExtendSharedRelationBuffered() to extend the relation and
> keep the last page locked, but the BulkExtendSharedRelationBuffered()
> signature doesn't allow that. It can return the first N pages locked, but
> there's no way to return the *last* page locked.
We can't rely on bulk extending a, potentially, large number of pages in one
go anyway (since we might not be allowed to pin that many pages). So I don't
think requiring locking the last page is a really viable API.
I think for this case I'd just just use the ExtendRelationTo() API we were
discussing nearby. Compared to the cost of reducing syscalls / filesystem
overhead to extend the relation, the cost of the buffer mapping lookup does't
seem significant. That's different in e.g. the hio.c case, because there we
need a buffer with free space, and concurrent activity could otherwise fill up
the buffer before we can lock it again.
I had started hacking on ExtendRelationTo() that when I saw problems with the
existing code that made me hesitate:
https://postgr.es/m/20230223010147.32oir7sb66slqnjk%40awork3.anarazel.de
> Perhaps we should decompose this function into several function calls.
> Something like:
>
> /* get N victim buffers, pinned and !BM_VALID */
> buffers = BeginExtendRelation(int npages);
>
> LockRelationForExtension(rel)
>
> /* Insert buffers into buffer table */
> first_blk = smgrnblocks()
> for (blk = first_blk; blk < last_blk; blk++)
> MapNewBuffer(blk, buffers[i])
>
> /* extend the file on disk */
> smgrzeroextend();
>
> UnlockRelationForExtension(rel)
>
> for (blk = first_blk; blk < last_blk; blk++)
> {
> memset(BufferGetPage(buffers[i]), 0,
> FinishNewBuffer(buffers[i])
> /* optionally lock the buffer */
> LockBuffer(buffers[i]);
> }
>
> That's a lot more verbose, of course, but gives the callers the flexibility.
> And might even be more readable than one function call with lots of
> arguments.
To me this seems like a quite bad idea. The amount of complexity this would
expose all over the tree is substantial. Which would also make it harder to
further improve relation extension at a later date. It certainly shouldn't be
the default interface. And I'm not sure I see a promisung usecase.
> This would expose the concept of a buffer that's mapped but marked as
> IO-in-progress outside bufmgr.c. On one hand, maybe that's exposing details
> that shouldn't be exposed. On the other hand, it might come handy. Instead
> of RBM_ZERO_AND_LOCK mode, for example, it might be handy to have a function
> that returns an IO-in-progress buffer that you can initialize any way you
> want.
I'd much rather encapsulate that in additional functions, or perhaps a
callback that can make decisions about what to do.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Juan José Santamaría Flecha | 2023-02-27 22:05:23 | Re: WIN32 pg_import_system_collations |
Previous Message | Peter Smith | 2023-02-27 21:21:38 | Re: PGDOCS - sgml linkend using single-quotes |