Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-20 14:19:46
Message-ID: CANWCAZatsDg54wCjpF-GvL0TiOcvSQpDk9PD=-gsMEo8ariPEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2024 at 8:30 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I forgot to report the results. Yes, I did some tests where I inserted
> many TIDs to make the tidstore use several GB memory. I did two cases:
>
> 1. insert 100M blocks of TIDs with an offset of 100.
> 2. insert 10M blocks of TIDs with an offset of 2048.
>
> The tidstore used about 4.8GB and 5.2GB, respectively, and all lookup
> and iteration results were expected.

Thanks for confirming!

> While reviewing the codes again, the following two things caught my eyes:
>
> in check_set_block_offset() function, we don't take a lock on the
> tidstore while checking all possible TIDs. I'll add
> TidStoreLockShare() and TidStoreUnlock() as follows:
>
> + TidStoreLockShare(tidstore);
> if (TidStoreIsMember(tidstore, &tid))
> ItemPointerSet(&items.lookup_tids[num_lookup_tids++],
> blkno, offset);
> + TidStoreUnlock(tidstore);

In one sense, all locking in the test module is useless since there is
only a single process. On the other hand, it seems good to at least
run what we have written to run it trivially, and serve as an example
of usage. We should probably be consistent, and document at the top
that the locks are pro-forma only.

It's both a blessing and a curse that vacuum only has a single writer.
It makes development less of a hassle, but also means that tidstore
locking is done for API-completeness reasons, not (yet) as a practical
necessity. Even tidbitmap.c's hash table currently has a single
writer, and while using tidstore for that is still an engineering
challenge for other reasons, it wouldn't exercise locking
meaningfully, either, at least at first.

> Regarding TidStoreMemoryUsage(), IIUC the caller doesn't need to take
> a lock on the shared tidstore since dsa_get_total_size() (called by
> RT_MEMORY_USAGE()) does appropriate locking. I think we can mention it
> in the comment as follows:
>
> -/* Return the memory usage of TidStore */
> +/*
> + * Return the memory usage of TidStore.
> + *
> + * In shared TidStore cases, since shared_ts_memory_usage() does appropriate
> + * locking, the caller doesn't need to take a lock.
> + */
>
> What do you think?

That duplicates the underlying comment on the radix tree function that
this calls, so I'm inclined to leave it out. At this level it's
probably best to document when a caller _does_ need to take an action.

One thing I forgot to ask about earlier:

+-- Add tids in out of order.

Are they (the blocks to be precise) really out of order? The VALUES
statement is ordered, but after inserting it does not output that way.
I wondered if this is platform independent, but CI and our dev
machines haven't failed this test, and I haven't looked into what
determines the order. It's easy enough to hide the blocks if we ever
need to, as we do elsewhere...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amonson, Paul D 2024-03-20 14:23:55 RE: Popcount optimization using AVX512
Previous Message Jakub Wartak 2024-03-20 14:17:28 Re: pg_upgrade --copy-file-range