Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-09-22 04:46:24
Message-ID: CAFBsxsFk7btGg8G_Yjy76be8AJ9PKZH677zK5NKx4uL2EPRtcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart <nathandbossart(at)gmail(dot)com>
wrote:
>
> On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote:
>
> > In short, this code needs to be lower level so that we still have full
> > control while being portable. I will work on this, and also the related
> > code for node dispatch.
>
> Is it possible to use approach #2 here, too? AFAICT space is allocated
for
> all of the chunks, so there wouldn't be any danger in searching all them
> and discarding any results >= node->count.

Sure, the caller could pass the maximum node capacity, and then check if
the returned index is within the range of the node count.

> Granted, we're depending on the
> number of chunks always being a multiple of elements-per-vector in order
to
> avoid the tail path, but that seems like a reasonably safe assumption that
> can be covered with comments.

Actually, we don't need to depend on that at all. When I said "junk" above,
that can be any bytes, as long as we're not reading off the end of
allocated memory. We'll never do that here, since the child pointers/values
follow. In that case, the caller can hard-code the size (it would even
happen to work now to multiply rt_node_kind by 16, to be sneaky). One thing
I want to try soon is storing fewer than 16/32 etc entries, so that the
whole node fits comfortably inside a power-of-two allocation. That would
allow us to use aset without wasting space for the smaller nodes, which
would be faster and possibly would solve the fragmentation problem Andres
referred to in

https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

While on the subject, I wonder how important it is to keep the chunks in
the small nodes in sorted order. That adds branches and memmove calls, and
is the whole reason for the recent "pg_lfind_ge" function.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-09-22 04:47:34 Re: pg_basebackup's --gzip switch misbehaves
Previous Message Michael Paquier 2022-09-22 04:37:01 Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory