Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-12-08 02:24:38
Message-ID: CAD21AoCUSjRWN_AM6deg2QhuDbc_uuLxQpnYm0GF60Kbwm_8rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 7, 2023 at 12:27 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Mon, Nov 27, 2023 at 1:45 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Sat, Oct 28, 2023 at 5:56 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> > bool
> > RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p);
> > or for variable-length value support,
> > RT_SET(RT_RADIX_TREE *tree, uint64 key, RT_VALUE_TYPE *value_p, size_t sz);
> >
> > If an entry already exists, update its value to 'value_p' and return
> > true. Otherwise set the value and return false.
>
> > RT_VALUE_TYPE
> > RT_INSERT(RT_RADIX_TREE *tree, uint64 key, size_t sz, bool *found);
> >
> > If the entry already exists, replace the value with a new empty value
> > with sz bytes and set "found" to true. Otherwise, insert an empty
> > value, return its pointer, and set "found" to false.
> >
> > We probably will find a better name but I use RT_INSERT() for
> > discussion. RT_INSERT() returns an empty slot regardless of existing
> > values. It can be used to insert a new value or to replace the value
> > with a larger value.
>
> Looking at TidStoreSetBlockOffsets again (in particular how it works
> with RT_GET), and thinking about issues we've discussed, I think
> RT_SET is sufficient for vacuum. Here's how it could work:
>
> TidStoreSetBlockOffsets could have a stack variable that's "almost
> always" large enough. When not, it can allocate in its own context. It
> sets the necessary bits there. Then, it passes the pointer to RT_SET
> with the number of bytes to copy. That seems very simple.

Right.

>
> At some future time, we can add a new function with the complex
> business about getting the current value to modify it, with the
> re-alloc'ing that it might require.
>
> In other words, from both an API perspective and a performance
> perspective, it makes sense for tid store to have a simple "set"
> interface for vacuum that can be optimized for its characteristics
> (insert only, ordered offsets). And also a more complex one for bitmap
> scan (setting/unsetting bits of existing values, in any order). They
> can share the same iteration interface, key types, and value types.
>
> What do you think, Masahiko?

Good point. RT_SET() would be faster than RT_GET() and updating the
value because RT_SET() would not need to take care of the existing
value (its size, embedded or not, realloc etc).

I think that we can separate the radix tree patch into two parts: the
main implementation with RT_SET(), and more complex APIs such as
RT_GET() etc. That way, it would probably make it easy to complete the
radix tree and tidstore first.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Junwang Zhao 2023-12-08 02:32:27 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Masahiko Sawada 2023-12-08 01:56:50 Re: [PoC] Improve dead tuple storage for lazy vacuum