Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-12 10:34:30
Message-ID: CANWCAZYRMOD+vGtEqJKZDgdq=To_4314gwG66bAM0y7mcEibOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 11, 2024 at 3:13 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Mar 11, 2024 at 12:20 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> > + ts->context = CurrentMemoryContext;
> >
> > As far as I can tell, this member is never accessed again -- am I
> > missing something?
>
> You're right. It was used to re-create the tidstore in the same
> context again while resetting it, but we no longer support the reset
> API. Considering it again, would it be better to allocate the iterator
> struct in the same context as we store the tidstore struct?

That makes sense.

> > + /* DSA for tidstore will be detached at the end of session */
> >
> > No other test module pins the mapping, but that doesn't necessarily
> > mean it's wrong. Is there some advantage over explicitly detaching?
>
> One small benefit of not explicitly detaching dsa_area in
> tidstore_destroy() would be simplicity; IIUC if we want to do that, we
> need to remember the dsa_area using (for example) a static variable,
> and free it if it's non-NULL. I've implemented this idea in the
> attached patch.

Okay, I don't have a strong preference at this point.

> > +-- Add tids in random order.
> >
> > I don't see any randomization here. I do remember adding row_number to
> > remove whitespace in the output, but I don't remember a random order.
> > On that subject, the row_number was an easy trick to avoid extra
> > whitespace, but maybe we should just teach the setting function to
> > return blocknumber rather than null?
>
> Good idea, fixed.

+ test_set_block_offsets
+------------------------
+ 2147483647
+ 0
+ 4294967294
+ 1
+ 4294967295

Hmm, was the earlier comment about randomness referring to this? I'm
not sure what other regression tests do in these cases, or how
relibale this is. If this is a problem we could simply insert this
result into a temp table so it's not output.

> > +Datum
> > +tidstore_create(PG_FUNCTION_ARGS)
> > +{
> > ...
> > + tidstore = TidStoreCreate(max_bytes, dsa);
> >
> > +Datum
> > +tidstore_set_block_offsets(PG_FUNCTION_ARGS)
> > +{
> > ....
> > + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs);
> >
> > These names are too similar. Maybe the test module should do
> > s/tidstore_/test_/ or similar.
>
> Agreed.

Mostly okay, although a couple look a bit generic now. I'll leave it
up to you if you want to tweak things.

> > In general, the .sql file is still very hard-coded. Functions are
> > created that contain a VALUES statement. Maybe it's okay for now, but
> > wanted to mention it. Ideally, we'd have some randomized tests,
> > without having to display it. That could be in addition to (not
> > replacing) the small tests we have that display input. (see below)
> >
>
> Agreed to add randomized tests in addition to the existing tests.

I'll try something tomorrow.

> Sounds a good idea. In fact, if there are some bugs in tidstore, it's
> likely that even initdb would fail in practice. However, it's a very
> good idea that we can test the tidstore anyway with such a check
> without a debug-build array.
>
> Or as another idea, I wonder if we could keep the debug-build array in
> some form. For example, we use the array with the particular build
> flag and set a BF animal for that. That way, we can test the tidstore
> in more real cases.

I think the purpose of a debug flag is to help developers catch
mistakes. I don't think it's quite useful enough for that. For one, it
has the same 1GB limitation as vacuum's current array. For another,
it'd be a terrible way to debug moving tidbitmap.c from its hash table
to use TID store -- AND/OR operations and lossy pages are pretty much
undoable with a copy of vacuum's array. Last year, when I insisted on
trying a long term realistic load that compares the result with the
array, the encoding scheme was much harder to understand in code. I
think it's now easier, and there are better tests.

> In the latest (v69) patch:
>
> - squashed v68-0005 and v68-0006 patches.
> - removed most of the changes in v68-0007 patch.
> - addressed above review comments in v69-0002 patch.
> - v69-0003, 0004, and 0005 are miscellaneous updates.
>
> As for renaming TidStore to TIDStore, I dropped the patch for now
> since it seems we're using "Tid" in some function names and variable
> names. If we want to update it, we can do that later.

I think we're not consistent across the codebase, and it's fine to
drop that patch.

v70-0008:

@@ -489,7 +489,7 @@ parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
/*
* Free the current tidstore and return allocated DSA segments to the
* operating system. Then we recreate the tidstore with the same max_bytes
- * limitation.
+ * limitation we just used.

Nowadays, max_bytes is now more like a hint for tidstore, and not a
limitation, right? Vacuum has the limitation. Maybe instead of "with",
we should say "passing the same limitation".

I wonder how "di_info" would look as "dead_items_info". I don't feel
too strongly about it, though.

I'm going to try additional regression tests, as mentioned, and try a
couple benchmarks. It should be only a couple more days.

One thing that occurred to me: The radix tree regression tests only
compile and run the local memory case. The tidstore commit would be
the first time the buildfarm has seen the shared memory case, so we
should look out for possible build failures of the same sort we saw
with the the radix tree tests. I see you've already removed the
problematic link_with command -- that's the kind of thing to
double-check for.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2024-03-12 10:39:01 Re: Streaming I/O, vectored I/O (WIP)
Previous Message Andrei Lepikhov 2024-03-12 10:22:15 Re: a wrong index choose when statistics is out of date