Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-11 00:28:44
Message-ID: CAD21AoC+cSbacX6JAWj=JN8d5fe3XRE21WG1ROQ8XSuS-reyiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 8, 2024 at 8:35 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Wed, Jan 3, 2024 at 9:10 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> >
> > On Tue, Jan 2, 2024 at 8:01 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > > I agree that we expose RT_LOCK_* functions and have tidstore use them,
> > > but am not sure the if (TidStoreIsShared(ts) LWLockAcquire(..., ...)"
> > > calls part. I think that even if we expose them, we will still need to
> > > do something like "if (TidStoreIsShared(ts))
> > > shared_rt_lock_share(ts->tree.shared)", no?
> >
> > I'll come back to this topic separately.
>
> To answer your question, sure, but that "if (TidStoreIsShared(ts))"
> part would be pushed down into a function so that only one place has
> to care about it.
>
> However, I'm starting to question whether we even need that. Meaning,
> lock the tidstore separately. To "lock the tidstore" means to take a
> lock, _separate_ from the radix tree's internal lock, to control
> access to two fields in a separate "control object":
>
> +typedef struct TidStoreControl
> +{
> + /* the number of tids in the store */
> + int64 num_tids;
> +
> + /* the maximum bytes a TidStore can use */
> + size_t max_bytes;
>
> I'm pretty sure max_bytes does not need to be in shared memory, and
> certainly not under a lock: Thinking of a hypothetical
> parallel-prune-phase scenario, one way would be for a leader process
> to pass out ranges of blocks to workers, and when the limit is
> exceeded, stop passing out blocks and wait for all the workers to
> finish.

True. I agreed that it doesn't need to be under a lock anyway, as it's
read-only.

>
> As for num_tids, vacuum previously put the similar count in
>
> @@ -176,7 +179,8 @@ struct ParallelVacuumState
> PVIndStats *indstats;
>
> /* Shared dead items space among parallel vacuum workers */
> - VacDeadItems *dead_items;
> + TidStore *dead_items;
>
> VacDeadItems contained "num_items". What was the reason to have new
> infrastructure for that count? And it doesn't seem like access to it
> was controlled by a lock -- can you confirm? If we did get parallel
> pruning, maybe the count would belong inside PVShared?

I thought that since the tidstore is a general-purpose data structure
the shared counter should be protected by a lock. One thing I'm
concerned about is that we might need to update both the radix tree
and the counter atomically in some cases. But that's true we don't
need it for lazy vacuum at least for now. Even given the parallel scan
phase, probably we won't need to have workers check the total number
of stored tuples during a parallel scan.

>
> The number of tids is not that tightly bound to the tidstore's job. I
> believe tidbitmap.c (a possible future client) doesn't care about the
> global number of tids -- not only that, but AND/OR operations can
> change the number in a non-obvious way, so it would not be convenient
> to keep an accurate number anyway. But the lock would still be
> mandatory with this patch.

Very good point.

>
> If we can make vacuum work a bit closer to how it does now, it'd be a
> big step up in readability, I think. Namely, getting rid of all the
> locking logic inside tidstore.c and let the radix tree's locking do
> the right thing. We'd need to make that work correctly when receiving
> pointers to values upon lookup, and I already shared ideas for that.
> But I want to see if there is any obstacle in the way of removing the
> tidstore control object and it's separate lock.

So I agree to remove both max_bytes and num_items from the control
object.Also, as you mentioned, we can remove the tidstore control
object itself. TidStoreGetHandle() returns a radix tree handle, and we
can pass it to TidStoreAttach(). I'll try it.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2024-01-11 01:24:45 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Melanie Plageman 2024-01-11 00:24:50 Re: Show WAL write and fsync stats in pg_stat_io