Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2023-03-11 15:54:40
Message-ID: CAFBsxsETw48OJE_6euuScBDeDmPP=RM9+4ajagCY7_sFCR+Vuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 10, 2023 at 9:30 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Fri, Mar 10, 2023 at 3:42 PM John Naylor
> <john(dot)naylor(at)enterprisedb(dot)com> wrote:

> > I'd suggest sharing your todo list in the meanwhile, it'd be good to
discuss what's worth doing and what is not.
>
> Apart from more rounds of reviews and tests, my todo items that need
> discussion and possibly implementation are:

Quick thoughts on these:

> * The memory measurement in radix trees and the memory limit in
> tidstores. I've implemented it in v30-0007 through 0009 but we need to
> review it. This is the highest priority for me.

Agreed.

> * Additional size classes. It's important for an alternative of path
> compression as well as supporting our decoupling approach. Middle
> priority.

I'm going to push back a bit and claim this doesn't bring much gain, while
it does have a complexity cost. The node1 from Andres's prototype is 32
bytes in size, same as our node3, so it's roughly equivalent as a way to
ameliorate the lack of path compression. I say "roughly" because the loop
in node3 is probably noticeably slower. A new size class will by definition
still use that loop.

About a smaller node125-type class: I'm actually not even sure we need to
have any sub-max node bigger about 64 (node size 768 bytes). I'd just let
65+ go to the max node -- there won't be many of them, at least in
synthetic workloads we've seen so far.

> * Node shrinking support. Low priority.

This is an architectural wart that's been neglected since the tid store
doesn't perform deletion. We'll need it sometime. If we're not going to
make this work, why ship a deletion API at all?

I took a look at this a couple weeks ago, and fixing it wouldn't be that
hard. I even had an idea of how to detect when to shrink size class within
a node kind, while keeping the header at 5 bytes. I'd be willing to put
effort into that, but to have a chance of succeeding, I'm unwilling to make
it more difficult by adding more size classes at this point.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2023-03-11 18:00:00 Re: Add LZ4 compression in pg_dump
Previous Message Melanie Plageman 2023-03-11 14:55:33 Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode