Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-04-17 13:48:27
Message-ID: CAD21AoCRjnVxsKuyL8QUt8x=qz=MmA2Q-GyOwoCO5KM=HevvWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 7, 2023 at 6:55 PM John Naylor <john(dot)naylor(at)enterprisedb(dot)com> wrote:
>
> On Thu, Feb 16, 2023 at 11:44 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > We really ought to replace the tid bitmap used for bitmap heap scans. The
> > hashtable we use is a pretty awful data structure for it. And that's not
> > filled in-order, for example.
>
> I spent some time studying tidbitmap.c, and not only does it make sense to use a radix tree there, but since it has more complex behavior and stricter runtime requirements, it should really be the thing driving the design and tradeoffs, not vacuum:
>
> - With lazy expansion and single-value leaves, the root of a radix tree can point to a single leaf. That might get rid of the need to track TBMStatus, since setting a single-leaf tree should be cheap.
>

Instead of introducing single-value leaves to the radix tree as
another structure, can we store pointers to PagetableEntry as values?

> - Fixed-size PagetableEntry's are pretty large, but the tid compression scheme used in this thread (in addition to being complex) is not a great fit for tidbitmap because it makes it more difficult to track per-block metadata (see also next point). With the "combined pointer-value slots" technique, if a page's max tid offset is 63 or less, the offsets can be stored directly in the pointer for the exact case. The lowest bit can tag to indicate a pointer to a single-value leaf. That would complicate operations like union/intersection and tracking "needs recheck", but it would reduce memory use and node-traversal in common cases.
>
> - Managing lossy storage. With pure blocknumber keys, replacing exact storage for a range of 256 pages amounts to replacing a last-level node with a single leaf containing one lossy PagetableEntry. The leader could iterate over the nodes, and rank the last-level nodes by how much storage they (possibly with leaf children) are using, and come up with an optimal lossy-conversion plan.
>
> The above would address the points (not including better iteration and parallel bitmap index scans) raised in
>
> https://www.postgresql.org/message-id/CAPsAnrn5yWsoWs8GhqwbwAJx1SeLxLntV54Biq0Z-J_E86Fnng@mail.gmail.com
>
> Ironically, by targeting a more difficult use case, it's easier since there is less freedom. There are many ways to beat a binary search, but fewer good ways to improve bitmap heap scan. I'd like to put aside vacuum for some time and try killing two birds with one stone, building upon our work thus far.
>
> Note: I've moved the CF entry to the next CF, and set to waiting on author for now. Since no action is currently required from Masahiko, I've added myself as author as well. If tackling bitmap heap scan shows promise, we could RWF and resurrect at a later time.

Thanks. I'm going to continue researching the memory limitation and
try lazy path expansion until PG17 development begins.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2023-04-17 13:51:54 Re: longfin missing gssapi_ext.h
Previous Message Tom Lane 2023-04-17 13:37:57 Re: longfin missing gssapi_ext.h