Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-11 08:13:17
Message-ID: CAD21AoC2SXAvOcSTrk-tpnbEv_kpmv4bgJRbOtqr_mHBfBihxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 11, 2024 at 12:20 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Thu, Mar 7, 2024 at 10:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > I've attached the remaining patches for CI. I've made some minor
> > changes in separate patches and drafted the commit message for
> > tidstore patch.
> >
> > While reviewing the tidstore code, I thought that it would be more
> > appropriate to place tidstore.c under src/backend/lib instead of
> > src/backend/common/access since the tidstore is no longer implemented
> > only for heap or other access methods, and it might also be used by
> > executor nodes in the future. What do you think?
>
> That's a heck of a good question. I don't think src/backend/lib is
> right -- it seems that's for general-purpose data structures.
> Something like backend/utils is also too general.
> src/backend/access/common has things for tuple descriptors, toast,
> sessions, and I don't think tidstore is out of place here. I'm not
> sure there's a better place, but I could be convinced otherwise.

Yeah, I agreed that src/backend/lib seems not to be the place for
tidstore. Let's keep it in src/backend/access/common. If others think
differently, we can move it later.

>
> v68-0001:
>
> I'm not sure if commit messages are much a subject of review, and it's
> up to the committer, but I'll share a couple comments just as
> something to think about, not something I would ask you to change: I
> think it's a bit distracting that the commit message talks about the
> justification to use it for vacuum. Let's save that for the commit
> with actual vacuum changes. Also, I suspect saying there are a "wide
> range" of uses is over-selling it a bit, and that paragraph is a bit
> awkward aside from that.

Thank you for the comment, and I agreed. I've updated the commit message.

>
> + /* Collect TIDs extracted from the key-value pair */
> + result->num_offsets = 0;
> +
>
> This comment has nothing at all to do with this line. If the comment
> is for several lines following, some of which are separated by blank
> lines, there should be a blank line after the comment. Also, why isn't
> tidstore_iter_extract_tids() responsible for setting that to zero?

Agreed, fixed.

I also updated this part so we set result->blkno in
tidstore_iter_extract_tids() too, which seems more readable.

>
> + ts->context = CurrentMemoryContext;
>
> As far as I can tell, this member is never accessed again -- am I
> missing something?

You're right. It was used to re-create the tidstore in the same
context again while resetting it, but we no longer support the reset
API. Considering it again, would it be better to allocate the iterator
struct in the same context as we store the tidstore struct?

>
> + /* DSA for tidstore will be detached at the end of session */
>
> No other test module pins the mapping, but that doesn't necessarily
> mean it's wrong. Is there some advantage over explicitly detaching?

One small benefit of not explicitly detaching dsa_area in
tidstore_destroy() would be simplicity; IIUC if we want to do that, we
need to remember the dsa_area using (for example) a static variable,
and free it if it's non-NULL. I've implemented this idea in the
attached patch.

>
> +-- Add tids in random order.
>
> I don't see any randomization here. I do remember adding row_number to
> remove whitespace in the output, but I don't remember a random order.
> On that subject, the row_number was an easy trick to avoid extra
> whitespace, but maybe we should just teach the setting function to
> return blocknumber rather than null?

Good idea, fixed.

>
> +Datum
> +tidstore_create(PG_FUNCTION_ARGS)
> +{
> ...
> + tidstore = TidStoreCreate(max_bytes, dsa);
>
> +Datum
> +tidstore_set_block_offsets(PG_FUNCTION_ARGS)
> +{
> ....
> + TidStoreSetBlockOffsets(tidstore, blkno, offs, noffs);
>
> These names are too similar. Maybe the test module should do
> s/tidstore_/test_/ or similar.

Agreed.

>
> +/* Sanity check if we've called tidstore_create() */
> +static void
> +check_tidstore_available(void)
> +{
> + if (tidstore == NULL)
> + elog(ERROR, "tidstore is not initialized");
> +}
>
> I don't find this very helpful. If a developer wiped out the create
> call, wouldn't the test crash and burn pretty obviously?

Removed.

>
> In general, the .sql file is still very hard-coded. Functions are
> created that contain a VALUES statement. Maybe it's okay for now, but
> wanted to mention it. Ideally, we'd have some randomized tests,
> without having to display it. That could be in addition to (not
> replacing) the small tests we have that display input. (see below)
>

Agreed to add randomized tests in addition to the existing tests.

>
> v68-0002:
>
> @@ -329,6 +381,13 @@ TidStoreIsMember(TidStore *ts, ItemPointer tid)
>
> ret = (page->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0;
>
> +#ifdef TIDSTORE_DEBUG
> + if (!TidStoreIsShared(ts))
> + {
> + bool ret_debug = ts_debug_is_member(ts, tid);;
> + Assert(ret == ret_debug);
> + }
> +#endif
>
> This only checking the case where we haven't returned already. In particular...
>
> + /* no entry for the blk */
> + if (page == NULL)
> + return false;
> +
> + wordnum = WORDNUM(off);
> + bitnum = BITNUM(off);
> +
> + /* no bitmap for the off */
> + if (wordnum >= page->nwords)
> + return false;
>
> ...these results are not checked.
>
> More broadly, it seems like the test module should be able to test
> everything that the debug-build array would complain about. Including
> ordered iteration. This may require first saving our test input to a
> table. We could create a cursor on a query that fetches the ordered
> input from the table and verifies that the tid store iterate produces
> the same ordered set, maybe with pl/pgSQL. Or something like that.
> Seems like not a whole lot of work. I can try later in the week, if
> you like.

Sounds a good idea. In fact, if there are some bugs in tidstore, it's
likely that even initdb would fail in practice. However, it's a very
good idea that we can test the tidstore anyway with such a check
without a debug-build array.

Or as another idea, I wonder if we could keep the debug-build array in
some form. For example, we use the array with the particular build
flag and set a BF animal for that. That way, we can test the tidstore
in more real cases.

>
> v68-0005/6 look ready to squash

Done.

>
> v68-0008 - I'm not a fan of captilizing short comment fragments. I use
> the style of either: short lower-case phrases, or full sentences
> including capitalization, correct grammar and period. I see these two
> styles all over the code base, as appropriate.

Agreed.

>
> + /* Remain attached until end of backend */
>
> We'll probably want this comment, if in fact we want this behavior.

Kept it.

>
> + /*
> + * Note that funcctx->call_cntr is incremented in SRF_RETURN_NEXT
> + * before return.
> + */
>
> I'm not sure what this is trying to say or why it's relevant, since
> it's been a while since I've written a SRF in C.

I wanted to say is that we cannot do like:

SRF_RETURN_NEXT(funcctx, PointerGetDatum(&(tids[funcctx->call_cntr])));

because funcctx->call_cntr is incremented *before* return and
therefore we will end up accessing the index out of range. I've took
some time to realize this fact before.

> That's all I have for now, and I haven't looked at the vacuum changes this time.

Thank you for the comments!

In the latest (v69) patch:

- squashed v68-0005 and v68-0006 patches.
- removed most of the changes in v68-0007 patch.
- addressed above review comments in v69-0002 patch.
- v69-0003, 0004, and 0005 are miscellaneous updates.

As for renaming TidStore to TIDStore, I dropped the patch for now
since it seems we're using "Tid" in some function names and variable
names. If we want to update it, we can do that later.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v69-ART.tar.gz application/x-gzip 26.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Tselebrovskiy 2024-03-11 08:21:11 [PROPOSAL] Skip test citext_utf8 on Windows
Previous Message Etsuro Fujita 2024-03-11 07:56:58 Re: ExecAppendAsyncEventWait() in REL_14_STABLE can corrupt PG_exception_stack