Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-08 11:35:22
Message-ID: CANWCAZbSp1txmTopBmn9Y7spQ0yM-wzh13OFbMTFFMswCpQL0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 3, 2024 at 9:10 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > calls part. I think that even if we expose them, we will still need to
> > do something like "if (TidStoreIsShared(ts))
> > shared_rt_lock_share(ts->tree.shared)", no?
>
> I'll come back to this topic separately.

To answer your question, sure, but that "if (TidStoreIsShared(ts))"
part would be pushed down into a function so that only one place has
to care about it.

However, I'm starting to question whether we even need that. Meaning,
lock the tidstore separately. To "lock the tidstore" means to take a
lock, _separate_ from the radix tree's internal lock, to control
access to two fields in a separate "control object":

+typedef struct TidStoreControl
+{
+ /* the number of tids in the store */
+ int64 num_tids;
+
+ /* the maximum bytes a TidStore can use */
+ size_t max_bytes;

I'm pretty sure max_bytes does not need to be in shared memory, and
certainly not under a lock: Thinking of a hypothetical
parallel-prune-phase scenario, one way would be for a leader process
to pass out ranges of blocks to workers, and when the limit is
exceeded, stop passing out blocks and wait for all the workers to
finish.

As for num_tids, vacuum previously put the similar count in

@@ -176,7 +179,8 @@ struct ParallelVacuumState
PVIndStats *indstats;

/* Shared dead items space among parallel vacuum workers */
- VacDeadItems *dead_items;
+ TidStore *dead_items;

VacDeadItems contained "num_items". What was the reason to have new
infrastructure for that count? And it doesn't seem like access to it
was controlled by a lock -- can you confirm? If we did get parallel
pruning, maybe the count would belong inside PVShared?

The number of tids is not that tightly bound to the tidstore's job. I
believe tidbitmap.c (a possible future client) doesn't care about the
global number of tids -- not only that, but AND/OR operations can
change the number in a non-obvious way, so it would not be convenient
to keep an accurate number anyway. But the lock would still be
mandatory with this patch.

If we can make vacuum work a bit closer to how it does now, it'd be a
big step up in readability, I think. Namely, getting rid of all the
locking logic inside tidstore.c and let the radix tree's locking do
the right thing. We'd need to make that work correctly when receiving
pointers to values upon lookup, and I already shared ideas for that.
But I want to see if there is any obstacle in the way of removing the
tidstore control object and it's separate lock.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Geoff Winkless 2024-01-08 11:53:47 Re: weird GROUPING SETS and ORDER BY behaviour
Previous Message Alvaro Herrera 2024-01-08 11:25:20 Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock