Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-19 09:40:06
Message-ID: CANWCAZbeQnXH2yr32iW=+9s+WJUi7+dbyr7pmJ8OJ51+ObeUyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 19, 2024 at 10:24 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Mar 19, 2024 at 8:35 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 18, 2024 at 11:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > On Sun, Mar 17, 2024 at 11:46 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:

> > It might also be worth reducing the number of blocks in the random
> > test -- multiple runs will have different offsets anyway.
>
> Yes. If we reduce the number of blocks from 1000 to 100, the
> regression test took on my environment:
>
> 1000 blocks : 516 ms
> 100 blocks : 228 ms

Sounds good.

> Removed some unnecessary variables in 0002 patch.

Looks good.

> So the MaxBlocktableEntrySize calculation would be as follows?
>
> #define MaxBlocktableEntrySize \
> offsetof(BlocktableEntry, words) + \
> (sizeof(bitmapword) * \
> WORDS_PER_PAGE(Min(MaxOffsetNumber, \
> BITS_PER_BITMAPWORD * PG_INT8_MAX - 1))))
>
> I've made this change in the 0003 patch.

This is okay, but one side effect is that we have both an assert and
an elog, for different limits. I think we'll need a separate #define
to help. But for now, I don't want to hold up tidstore further with
this because I believe almost everything else in v74 is in pretty good
shape. I'll save this for later as a part of the optimization I
proposed.

Remaining things I noticed:

+#define RT_PREFIX local_rt
+#define RT_PREFIX shared_rt

Prefixes for simplehash, for example, don't have "sh" -- maybe "local/shared_ts"

+ /* MemoryContext where the radix tree uses */

s/where/that/

+/*
+ * Lock support functions.
+ *
+ * We can use the radix tree's lock for shared TidStore as the data we
+ * need to protect is only the shared radix tree.
+ */
+void
+TidStoreLockExclusive(TidStore *ts)

Talking about multiple things, so maybe a blank line after the comment.

With those, I think you can go ahead and squash all the tidstore
patches except for 0003 and commit it.

> While reviewing the vacuum patch, I realized that we always pass
> LWTRANCHE_SHARED_TIDSTORE to RT_CREATE(), and the wait event related
> to the tidstore is therefore always the same. I think it would be
> better to make the caller of TidStoreCreate() specify the tranch_id
> and pass it to RT_CREATE(). That way, the caller can specify their own
> wait event for tidstore. The 0008 patch tried this idea. dshash.c does
> the same idea.

Sounds reasonable. I'll just note that src/include/storage/lwlock.h
still has an entry for LWTRANCHE_SHARED_TIDSTORE.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2024-03-19 09:41:10 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Aleksander Alekseev 2024-03-19 09:38:44 Re: What is a typical precision of gettimeofday()?