Re: [PoC] Improve dead tuple storage for lazy vacuum

From: John Naylor <johncnaylorls(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-17 02:46:13
Message-ID: CANWCAZatuBSdkmbMrt=Pw1Xx1cKjstM7KVdUmdhpFeGWzY_OSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 15, 2024 at 9:17 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Mar 15, 2024 at 4:36 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 14, 2024 at 7:04 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:

> > > Given TidStoreSetBlockOffsets() is designed to always set (i.e.
> > > overwrite) the value, I think we should not expect that found is
> > > always false.
> >
> > I find that a puzzling statement, since 1) it was designed for
> > insert-only workloads, not actual overwrite IIRC and 2) the tests will
> > now fail if the same block is set twice, since we just switched the
> > tests to use a remnant of vacuum's old array. Having said that, I
> > don't object to removing artificial barriers to using it for purposes
> > not yet imagined, as long as test_tidstore.sql warns against that.
>
> I think that if it supports only insert-only workload and expects the
> same block is set only once, it should raise an error rather than an
> assertion. It's odd to me that the function fails only with an
> assertion build assertions even though it actually works fine even in
> that case.

After thinking some more, I think you're right -- it's too
heavy-handed to throw an error/assert and a public function shouldn't
make assumptions about the caller. It's probably just a matter of
documenting the function (and it's lack of generality), and the tests
(which are based on the thing we're replacing).

> As for test_tidstore you're right that the test code doesn't handle
> the case where setting the same block twice. I think that there is no
> problem in the fixed-TIDs tests, but we would need something for
> random-TIDs tests so that we don't set the same block twice. I guess
> it could be trivial since we can use SQL queries to generate TIDs. I'm
> not sure how the random-TIDs tests would be like, but I think we can
> use SELECT DISTINCT to eliminate the duplicates of block numbers to
> use.

Also, I don't think we need random blocks, since the radix tree tests
excercise that heavily already.

Random offsets is what I was thinking of (if made distinct and
ordered), but even there the code is fairy trivial, so I don't have a
strong feeling about it.

> > Given the above two things, I think this function's comment needs
> > stronger language about its limitations. Perhaps even mention that
> > it's intended for, and optimized for, vacuum. You and I have long
> > known that tidstore would need a separate, more complex, function to
> > add or remove individual tids from existing entries, but it might be
> > good to have that documented.
>
> Agreed.

How about this:

/*
- * Set the given TIDs on the blkno to TidStore.
+ * Create or replace an entry for the given block and array of offsets
*
- * NB: the offset numbers in offsets must be sorted in ascending order.
+ * NB: This function is designed and optimized for vacuum's heap scanning
+ * phase, so has some limitations:
+ * - The offset numbers in "offsets" must be sorted in ascending order.
+ * - If the block number already exists, the entry will be replaced --
+ * there is no way to add or remove offsets from an entry.
*/
void
TidStoreSetBlockOffsets(TidStore *ts, BlockNumber blkno, OffsetNumber *offsets,

I think we can stop including the debug-tid-store patch for CI now.
That would allow getting rid of some unnecessary variables. More
comments:

+ * Prepare to iterate through a TidStore. Since the radix tree is locked during
+ * the iteration, TidStoreEndIterate() needs to be called when finished.

+ * Concurrent updates during the iteration will be blocked when inserting a
+ * key-value to the radix tree.

This is outdated. Locking is optional. The remaining real reason now
is that TidStoreEndIterate needs to free memory. We probably need to
say something about locking, too, but not this.

+ * Scan the TidStore and return a pointer to TidStoreIterResult that has TIDs
+ * in one block. We return the block numbers in ascending order and the offset
+ * numbers in each result is also sorted in ascending order.
+ */
+TidStoreIterResult *
+TidStoreIterateNext(TidStoreIter *iter)

The wording is a bit awkward.

+/*
+ * Finish an iteration over TidStore. This needs to be called after finishing
+ * or when existing an iteration.
+ */

s/existing/exiting/ ?

It seems to say we need to finish after finishing. Maybe more precise wording.

+/* Extract TIDs from the given key-value pair */
+static void
+tidstore_iter_extract_tids(TidStoreIter *iter, uint64 key,
BlocktableEntry *page)

This is a leftover from the old encoding scheme. This should really
take a "BlockNumber blockno" not a "key", and the only call site
should probably cast the uint64 to BlockNumber.

+ * tidstore.h
+ * Tid storage.
+ *
+ *
+ * Portions Copyright (c) 1996-2023, PostgreSQL Global Development Group

Update year.

+typedef struct BlocktableEntry
+{
+ uint16 nwords;
+ bitmapword words[FLEXIBLE_ARRAY_MEMBER];
+} BlocktableEntry;

In my WIP for runtime-embeddable offsets, nwords needs to be one byte.
That doesn't have any real-world affect on the largest offset
encountered, and only in 32-bit builds with 32kB block size would the
theoretical max change at all. To be precise, we could use in the
MaxBlocktableEntrySize calculation:

Min(MaxOffsetNumber, BITS_PER_BITMAPWORD * PG_INT8_MAX - 1);

Tests: I never got rid of maxblkno and maxoffset, in case you wanted
to do that. And as discussed above, maybe

-- Note: The test code use an array of TIDs for verification similar
-- to vacuum's dead item array pre-PG17. To avoid adding duplicates,
-- each call to do_set_block_offsets() should use different block
-- numbers.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2024-03-17 02:47:33 Re: add AVX2 support to simd.h
Previous Message Tom Lane 2024-03-17 00:17:29 Re: REVOKE FROM warning on grantor