Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-14 01:53:14
Message-ID: CAD21AoA_TtELXs_6Ks3oJiSdWehBkO5h4irrqi5xq8SPzWqJNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 14, 2024 at 9:59 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Wed, Mar 13, 2024 at 9:29 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 13, 2024 at 8:05 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> > >
> > > On Wed, Mar 13, 2024 at 8:39 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > As I mentioned above, if we implement the test cases in C, we can use
> > > > the debug-build array in the test code. And we won't use it in AND/OR
> > > > operations tests in the future.
> > >
> > > That's a really interesting idea, so I went ahead and tried that for
> > > v71. This seems like a good basis for testing larger, randomized
> > > inputs, once we decide how best to hide that from the expected output.
> > > The tests use SQL functions do_set_block_offsets() and
> > > check_set_block_offsets(). The latter does two checks against a tid
> > > array, and replaces test_dump_tids().
> >
> > Great! I think that's a very good starter.
> >
> > The lookup_test() (and test_lookup_tids()) do also test that the
> > IsMember() function returns false as expected if the TID doesn't exist
> > in it, and probably we can do these tests in a C function too.
> >
> > BTW do we still want to test the tidstore by using a combination of
> > SQL functions? We might no longer need to input TIDs via a SQL
> > function.
>
> I'm not sure. I stopped short of doing that to get feedback on this
> much. One advantage with SQL functions is we can use generate_series
> to easily input lists of blocks with different numbers and strides,
> and array literals for offsets are a bit easier. What do you think?

While I'm not a fan of the following part, I agree that it makes sense
to use SQL functions for test data generation:

-- Constant values used in the tests.
\set maxblkno 4294967295
-- The maximum number of heap tuples (MaxHeapTuplesPerPage) in 8kB block is 291.
-- We use a higher number to test tidstore.
\set maxoffset 512

It would also be easier for developers to test the tidstore with their
own data set. So I agreed with the current approach; use SQL functions
for data generation and do the actual tests inside C functions. Is it
convenient for developers if we have functions like generate_tids()
and generate_random_tids() to generate TIDs so that they can pass them
to do_set_block_offsets()? Then they call check_set_block_offsets()
and others for actual data lookup and iteration tests.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-03-14 02:00:28 Re: Recent 027_streaming_regress.pl hangs
Previous Message Robert Treat 2024-03-14 01:47:53 small_cleanups around login event triggers