Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-14 13:42:49
Message-ID: CANWCAZaEdHBnB0ByVQm9egm3-nzVXt6FXbJmmFDasZYgaioaWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 12, 2024 at 3:49 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Thu, Jan 11, 2024 at 9:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > So I agree to remove both max_bytes and num_items from the control
> > object.Also, as you mentioned, we can remove the tidstore control
> > object itself. TidStoreGetHandle() returns a radix tree handle, and we
> > can pass it to TidStoreAttach(). I'll try it.

Thanks. It's worth looking closely here.

> I realized that if we remove the whole tidstore control object
> including max_bytes, processes who attached the shared tidstore cannot
> use TidStoreIsFull() actually as it always returns true.

I imagine that we'd replace that with a function (maybe an earlier
version had it?) to report the memory usage to the caller, which
should know where to find max_bytes.

> Also they
> cannot use TidStoreReset() as well since it needs to pass max_bytes to
> RT_CREATE(). It might not be a problem in terms of lazy vacuum, but it
> could be problematic for general use.

HEAD has no problem finding the necessary values, and I don't think
it'd be difficult to maintain that ability. I'm not actually sure what
"general use" needs to have, and I'm not sure anyone can guess.
There's the future possibility of parallel heap-scanning, but I'm
guessing a *lot* more needs to happen for that to work, so I'm not
sure how much it buys us to immediately start putting those two fields
in a special abstraction. The only other concrete use case mentioned
in this thread that I remember is bitmap heap scan, and I believe that
would never need to reset, only free the whole thing when finished.

I spent some more time studying parallel vacuum, and have some
thoughts. In HEAD, we have

-/*
- * VacDeadItems stores TIDs whose index tuples are deleted by index vacuuming.
- */
-typedef struct VacDeadItems
-{
- int max_items; /* # slots allocated in array */
- int num_items; /* current # of entries */
-
- /* Sorted array of TIDs to delete from indexes */
- ItemPointerData items[FLEXIBLE_ARRAY_MEMBER];
-} VacDeadItems;

...which has the tids, plus two fields that function _very similarly_
to the two extra fields in the tidstore control object. It's a bit
strange to me that the patch doesn't have this struct anymore.

I suspect if we keep it around (just change "items" to be the local
tidstore struct), the patch would have a bit less churn and look/work
more like the current code. I think it might be easier to read if the
v17 commits are suited to the current needs of vacuum, rather than try
to anticipate all uses. Richer abstractions can come later if needed.
Another stanza:

- /* Prepare the dead_items space */
- dead_items = (VacDeadItems *) shm_toc_allocate(pcxt->toc,
- est_dead_items_len);
- dead_items->max_items = max_items;
- dead_items->num_items = 0;
- MemSet(dead_items->items, 0, sizeof(ItemPointerData) * max_items);
- shm_toc_insert(pcxt->toc, PARALLEL_VACUUM_KEY_DEAD_ITEMS, dead_items);
- pvs->dead_items = dead_items;

With s/max_items/max_bytes/, I wonder if we can still use some of
this, and parallel workers would have no problem getting the necessary
info, as they do today. If not, I don't really understand why. I'm not
very familiar with working with shared memory, and I know the tree
itself needs some different setup, so it's quite possible I'm missing
something.

I find it difficult to kept straight these four things:

- radix tree
- radix tree control object
- tidstore
- tidstore control object

Even with the code in front of me, it's hard to reason about how these
concepts fit together. It'd be much more readable if this was
simplified.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-01-14 13:55:28 Re: Tidy fill hstv array (src/backend/access/heap/pruneheap.c)
Previous Message Cédric Villemain 2024-01-14 13:36:26 Re: Extension Enhancement: Buffer Invalidation in pg_buffercache