Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2022-07-08 06:43:32
Message-ID: CAFBsxsHEtfh76iZihHDwCT=Va8ubOqmMendk8jHZaE_L3m8ZsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 8, 2022 at 9:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> I guess that the tree height is affected by where garbages are, right?
> For example, even if all garbage in the table is concentrated in
> 0.5GB, if they exist between 2^17 and 2^18 block, we use the first
> byte of blockhi. If the table is larger than 128GB, the second byte of
> the blockhi could be used depending on where the garbage exists.

Right.

> Another variation of how to store TID would be that we use the block
> number as a key and store a bitmap of the offset as a value. We can
> use Bitmapset for example,

I like the idea of using existing code to set/check a bitmap if it's
convenient. But (in case that was implied here) I'd really like to
stay away from variable-length values, which would require
"Single-value leaves" (slow). I also think it's fine to treat the
key/value as just bits, and not care where exactly they came from, as
we've been talking about.

> or an approach like Roaring bitmap.

This would require two new data structures instead of one. That
doesn't seem like a path to success.

> I think that at this stage it's better to define the design first. For
> example, key size and value size, and these sizes are fixed or can be
> set the arbitary size?

I don't think we need to start over. Andres' prototype had certain
design decisions built in for the intended use case (although maybe
not clearly documented as such). Subsequent patches in this thread
substantially changed many design aspects. If there were any changes
that made things wonderful for vacuum, it wasn't explained, but Andres
did explain how some of these changes were not good for other uses.
Going to fixed 64-bit keys and values should still allow many future
applications, so let's do that if there's no reason not to.

> For value size, if we support
> different value sizes specified by the user, we can either embed
> multiple values in the leaf node (called Multi-value leaves in ART
> paper)

I don't think "Multi-value leaves" allow for variable-length values,
FWIW. And now I see I also used this term wrong in my earlier review
comment -- v3/4 don't actually use "multi-value leaves", but Andres'
does (going by the multiple leaf types). From the paper: "Multi-value
leaves: The values are stored in one of four different leaf node
types, which mirror the structure of inner nodes, but contain values
instead of pointers."

(It seems v3/v4 could be called a variation of "Combined pointer/value
slots: If values fit into pointers, no separate node types are
necessary. Instead, each pointer storage location in an inner node can
either store a pointer or a value." But without the advantage of
variable length keys).

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Erik Rijkers 2022-07-08 06:54:53 ERROR: operator does not exist: json = json
Previous Message Amit Kapila 2022-07-08 06:27:17 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns