Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-13 01:38:43
Message-ID: CAD21AoCh5QRQ27b1UthHCMhetCgfRYr+Rs2yJHh7u_D3t33g6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 12, 2024 at 7:34 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Mon, Mar 11, 2024 at 3:13 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Mon, Mar 11, 2024 at 12:20 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> > >
> > > On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> > > + ts->context = CurrentMemoryContext;
> > >
> > > As far as I can tell, this member is never accessed again -- am I
> > > missing something?
> >
> > You're right. It was used to re-create the tidstore in the same
> > context again while resetting it, but we no longer support the reset
> > API. Considering it again, would it be better to allocate the iterator
> > struct in the same context as we store the tidstore struct?
>
> That makes sense.
>
> > > + /* DSA for tidstore will be detached at the end of session */
> > >
> > > No other test module pins the mapping, but that doesn't necessarily
> > > mean it's wrong. Is there some advantage over explicitly detaching?
> >
> > One small benefit of not explicitly detaching dsa_area in
> > tidstore_destroy() would be simplicity; IIUC if we want to do that, we
> > need to remember the dsa_area using (for example) a static variable,
> > and free it if it's non-NULL. I've implemented this idea in the
> > attached patch.
>
> Okay, I don't have a strong preference at this point.

I'd keep the update on that.

>
> > > +-- Add tids in random order.
> > >
> > > I don't see any randomization here. I do remember adding row_number to
> > > remove whitespace in the output, but I don't remember a random order.
> > > On that subject, the row_number was an easy trick to avoid extra
> > > whitespace, but maybe we should just teach the setting function to
> > > return blocknumber rather than null?
> >
> > Good idea, fixed.
>
> + test_set_block_offsets
> +------------------------
> + 2147483647
> + 0
> + 4294967294
> + 1
> + 4294967295
>
> Hmm, was the earlier comment about randomness referring to this? I'm
> not sure what other regression tests do in these cases, or how
> relibale this is. If this is a problem we could simply insert this
> result into a temp table so it's not output.

I didn't address the comment about randomness.

I think that we will have both random TIDs tests and fixed TIDs tests
in test_tidstore as we discussed, and probably we can do both tests
with similar steps; insert TIDs into both a temp table and tidstore
and check if the tidstore returned the results as expected by
comparing the results to the temp table. Probably we can have a common
pl/pgsql function that checks that and raises a WARNING or an ERROR.
Given that this is very similar to what we did in test_radixtree, why
do we really want to implement it using a pl/pgsql function? When we
discussed it before, I found the current way makes sense. But given
that we're adding more tests and will add more tests in the future,
doing the tests in C will be more maintainable and faster. Also, I
think we can do the debug-build array stuff in the test_tidstore code
instead.

>
> > > +Datum
> > > +tidstore_create(PG_FUNCTION_ARGS)
> > > +{
> > > ...
> > > + tidstore = TidStoreCreate(max_bytes, dsa);
> > >
> > > +Datum
> > > +tidstore_set_block_offsets(PG_FUNCTION_ARGS)
> > > +{
> > > ....
> > > + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs);
> > >
> > > These names are too similar. Maybe the test module should do
> > > s/tidstore_/test_/ or similar.
> >
> > Agreed.
>
> Mostly okay, although a couple look a bit generic now. I'll leave it
> up to you if you want to tweak things.
>
> > > In general, the .sql file is still very hard-coded. Functions are
> > > created that contain a VALUES statement. Maybe it's okay for now, but
> > > wanted to mention it. Ideally, we'd have some randomized tests,
> > > without having to display it. That could be in addition to (not
> > > replacing) the small tests we have that display input. (see below)
> > >
> >
> > Agreed to add randomized tests in addition to the existing tests.
>
> I'll try something tomorrow.
>
> > Sounds a good idea. In fact, if there are some bugs in tidstore, it's
> > likely that even initdb would fail in practice. However, it's a very
> > good idea that we can test the tidstore anyway with such a check
> > without a debug-build array.
> >
> > Or as another idea, I wonder if we could keep the debug-build array in
> > some form. For example, we use the array with the particular build
> > flag and set a BF animal for that. That way, we can test the tidstore
> > in more real cases.
>
> I think the purpose of a debug flag is to help developers catch
> mistakes. I don't think it's quite useful enough for that. For one, it
> has the same 1GB limitation as vacuum's current array. For another,
> it'd be a terrible way to debug moving tidbitmap.c from its hash table
> to use TID store -- AND/OR operations and lossy pages are pretty much
> undoable with a copy of vacuum's array.

Valid points.

As I mentioned above, if we implement the test cases in C, we can use
the debug-build array in the test code. And we won't use it in AND/OR
operations tests in the future.

>
> > In the latest (v69) patch:
> >
> > - squashed v68-0005 and v68-0006 patches.
> > - removed most of the changes in v68-0007 patch.
> > - addressed above review comments in v69-0002 patch.
> > - v69-0003, 0004, and 0005 are miscellaneous updates.
> >
> > As for renaming TidStore to TIDStore, I dropped the patch for now
> > since it seems we're using "Tid" in some function names and variable
> > names. If we want to update it, we can do that later.
>
> I think we're not consistent across the codebase, and it's fine to
> drop that patch.
>
> v70-0008:
>
> @@ -489,7 +489,7 @@ parallel_vacuum_reset_dead_items(ParallelVacuumState *pvs)
> /*
> * Free the current tidstore and return allocated DSA segments to the
> * operating system. Then we recreate the tidstore with the same max_bytes
> - * limitation.
> + * limitation we just used.
>
> Nowadays, max_bytes is now more like a hint for tidstore, and not a
> limitation, right? Vacuum has the limitation.

Right.

> Maybe instead of "with",
> we should say "passing the same limitation".

Will fix.

>
> I wonder how "di_info" would look as "dead_items_info". I don't feel
> too strongly about it, though.

Agreed.

>
> I'm going to try additional regression tests, as mentioned, and try a
> couple benchmarks. It should be only a couple more days.

Thank you!

> One thing that occurred to me: The radix tree regression tests only
> compile and run the local memory case. The tidstore commit would be
> the first time the buildfarm has seen the shared memory case, so we
> should look out for possible build failures of the same sort we saw
> with the the radix tree tests. I see you've already removed the
> problematic link_with command -- that's the kind of thing to
> double-check for.

Good point, agreed. I'll double-check it again.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-03-13 01:47:33 Re: Improve eviction algorithm in ReorderBuffer
Previous Message Peter Smith 2024-03-13 01:15:06 Re: Improve eviction algorithm in ReorderBuffer