Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-01-22 05:35:49
Message-ID: CANWCAZZ+O6Uv+fYH726eG_TYqZSLQ=-ZuiYk5TWCr4RNNTbsPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 22, 2024 at 10:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On further thought, as you pointed out before, "num_tids" should not
> be in tidstore in terms of integration with tidbitmap.c, because
> tidbitmap.c has "lossy pages". With lossy pages, "num_tids" is no
> longer accurate and useful. Similarly, looking at tidbitmap.c, it has
> npages and nchunks but they will not be necessary in lazy vacuum use
> case. Also, assuming that we support parallel heap pruning, probably
> we need to somehow lock the tidstore while adding tids to the tidstore
> concurrently by parallel vacuum worker. But in tidbitmap use case, we
> don't need to lock the tidstore since it doesn't have multiple
> writers.

Not currently, and it does seem bad to require locking where it's not required.

(That would be a prerequisite for parallel index scan. It's been tried
before with the hash table, but concurrency didn't scale well with the
hash table. I have no reason to think that the radix tree would scale
significantly better with the same global LW lock, but as you know
there are other locking schemes possible.)

> Given these facts, different statistics and different lock
> strategies are required by different use case. So I think there are 3
> options:
>
> 1. expose lock functions for tidstore and the caller manages the
> statistics in the outside of tidstore. For example, in lazyvacuum.c we
> would have a TidStore for tid storage as well as VacDeadItemsInfo that
> has num_tids and max_bytes. Both are in LVRelState. For parallel
> vacuum, we pass both to the workers via DSM and pass both to function
> where the statistics are required. As for the exposed lock functions,
> when adding tids to the tidstore, the caller would need to call
> something like TidStoreLockExclusive(ts) that further calls
> LWLockAcquire(ts->tree.shared->ctl.lock, LW_EXCLUSIVE) internally.

The advantage here is that vacuum can avoid locking entirely while
using shared memory, just like it does now, and has the option to add
it later.
IIUC, the radix tree struct would have a lock member, but wouldn't
take any locks internally? Maybe we still need one for
RT_MEMORY_USAGE? For that, I see dsa_get_total_size() takes its own
DSA_AREA_LOCK -- maybe that's enough?

That seems simplest, and is not very far from what we do now. If we do
this, then the lock functions should be where we branch for is_shared.

> 2. add callback functions to tidstore so that the caller can do its
> work while holding a lock on the tidstore. This is like the idea we
> just discussed for radix tree. The caller passes a callback function
> and user data to TidStoreSetBlockOffsets(), and the callback is called
> after setting tids. Similar to option 1, the statistics need to be
> stored in a different area.

I think we'll have to move to something like this eventually, but it
seems like overkill right now.

> 3. keep tidstore.c and tidbitmap.c separate implementations but use
> radix tree in tidbitmap.c. tidstore.c would have "num_tids" in its
> control object and doesn't have any lossy page support. On the other
> hand, in tidbitmap.c we replace simplehash with radix tree. This makes
> tidstore.c simple but we would end up having different data structures
> for similar usage.

They have so much in common that it's worth it to use the same
interface and (eventually) value type. They just need separate paths
for adding tids, as we've discussed.

> I think it's worth trying option 1. What do you think, John?

+1

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-01-22 05:38:42 Re: Permute underscore separated components of columns before fuzzy matching
Previous Message Peter Smith 2024-01-22 05:24:13 Re: [PATCH] New [relation] option engine