Re: [PoC] Improve dead tuple storage for lazy vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum
Date: 2024-03-25 13:06:46
Message-ID: CAD21AoAxXGf7FbuA9jvx+XUFrEJUz9JNOmkxF8SLppOOOvhBtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 25, 2024 at 3:25 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Fri, Mar 22, 2024 at 12:20 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Mar 21, 2024 at 7:48 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> > > v77-0001
> > >
> > > - dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
> > > - dead_items->max_items = max_items;
> > > - dead_items->num_items = 0;
> > > + vacrel->dead_items = TidStoreCreate(vac_work_mem, NULL, 0);
> > > +
> > > + dead_items_info = (VacDeadItemsInfo *) palloc(sizeof(VacDeadItemsInfo));
> > > + dead_items_info->max_bytes = vac_work_mem * 1024L;
> > >
> > > This is confusing enough that it looks like a bug:
> > >
> > > [inside TidStoreCreate()]
> > > /* choose the maxBlockSize to be no larger than 1/16 of max_bytes */
> > > while (16 * maxBlockSize > max_bytes * 1024L)
> > > maxBlockSize >>= 1;
> > >
> > > This was copied from CreateWorkExprContext, which operates directly on
> > > work_mem -- if the parameter is actually bytes, we can't "* 1024"
> > > here. If we're passing something measured in kilobytes, the parameter
> > > is badly named. Let's use convert once and use bytes everywhere.
> >
> > True. The attached 0001 patch fixes it.
>
> v78-0001 and 02 are fine, but for 0003 there is a consequence that I
> didn't see mentioned:

I think that the fix done in 0001 patch can be merged into 0003 patch.

> vac_work_mem now refers to bytes, where before
> it referred to kilobytes. It seems pretty confusing to use a different
> convention from elsewhere, especially if it has the same name but
> different meaning across versions. Worse, this change is buried inside
> a moving-stuff-around diff, making it hard to see. Maybe "convert only
> once" is still possible, but I was actually thinking of
>
> + dead_items_info->max_bytes = vac_work_mem * 1024L;
> + vacrel->dead_items = TidStoreCreate(dead_items_info->max_bytes, NULL, 0);
>
> That way it's pretty obvious that it's correct. That may require a bit
> of duplication and moving around for shmem, but there is some of that
> already.

Agreed.

>
> More on 0003:
>
> - * The major space usage for vacuuming is storage for the array of dead TIDs
> + * The major space usage for vacuuming is TidStore, a storage for dead TIDs
>
> + * autovacuum_work_mem) memory space to keep track of dead TIDs. If the
> + * TidStore is full, we must call lazy_vacuum to vacuum indexes (and to vacuum
>
> I wonder if the comments here should refer to it using a more natural
> spelling, like "TID store".
>
> - * items in the dead_items array for later vacuuming, count live and
> + * items in the dead_items for later vacuuming, count live and
>
> Maybe "the dead_items area", or "the dead_items store" or "in dead_items"?
>
> - * remaining LP_DEAD line pointers on the page in the dead_items
> - * array. These dead items include those pruned by lazy_scan_prune()
> - * as well we line pointers previously marked LP_DEAD.
> + * remaining LP_DEAD line pointers on the page in the dead_items.
> + * These dead items include those pruned by lazy_scan_prune() as well
> + * we line pointers previously marked LP_DEAD.
>
> Here maybe "into dead_items".
>
> Also, "we line pointers" seems to be a pre-existing typo.
>
> - (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages",
> - vacrel->relname, (long long) index, vacuumed_pages)));
> + (errmsg("table \"%s\": removed " INT64_FORMAT "dead item identifiers
> in %u pages",
> + vacrel->relname, vacrel->dead_items_info->num_items, vacuumed_pages)));
>
> This is a translated message, so let's keep the message the same.
>
> /*
> * Allocate dead_items (either using palloc, or in dynamic shared memory).
> * Sets dead_items in vacrel for caller.
> *
> * Also handles parallel initialization as part of allocating dead_items in
> * DSM when required.
> */
> static void
> dead_items_alloc(LVRelState *vacrel, int nworkers)
>
> This comment didn't change at all. It's not wrong, but let's consider
> updating the specifics.

Fixed above comments.

> v78-0005:
>
> "Although commit XXX
> allowed specifying the initial and maximum DSA segment sizes, callers
> still needed to clamp their own limits, which was not consistent and
> user-friendly."
>
> Perhaps s/still needed/would have needed/ ..., since we're preventing
> that necessity.
>
> > > Did you try it with 1MB m_w_m?
> >
> > I've incorporated the above comments and test results look good to me.
>
> Could you be more specific about what the test was?
> Does it work with 1MB m_w_m?

If m_w_m is 1MB, both the initial and maximum segment sizes are 256kB.

FYI other test cases I tested were:

* m_w_m = 2199023254528 (maximum value)
initial: 1MB
max: 128GB

* m_w_m = 64MB (default)
initial: 1MB
max: 8MB

>
> + /*
> + * Choose the initial and maximum DSA segment sizes to be no longer
> + * than 1/16 and 1/8 of max_bytes, respectively. If the initial
> + * segment size is low, we end up having many segments, which risks
> + * exceeding the total number of segments the platform can have.
>
> The second sentence is technically correct, but I'm not sure how it
> relates to the code that follows.
>
> + while (16 * dsa_init_size > max_bytes)
> + dsa_init_size >>= 1;
> + while (8 * dsa_max_size > max_bytes)
> + dsa_max_size >>= 1;
>
> I'm not sure we need a separate loop for "dsa_init_size". Can we just have :
>
> while (8 * dsa_max_size > max_bytes)
> dsa_max_size >>= 1;
>
> if (dsa_max_size < DSA_MIN_SEGMENT_SIZE)
> dsa_max_size = DSA_MIN_SEGMENT_SIZE;
>
> if (dsa_init_size > dsa_max_size)
> dsa_init_size = dsa_max_size;

Agreed.

>
> @@ -113,13 +113,10 @@ static void
> tidstore_iter_extract_tids(TidStoreIter *iter, BlockNumber blkno,
> * CurrentMemoryContext at the time of this call. The TID storage, backed
> * by a radix tree, will live in its child memory context, rt_context. The
> * TidStore will be limited to (approximately) max_bytes total memory
> - * consumption. If the 'area' is non-NULL, the radix tree is created in the
> - * DSA area.
> - *
> - * The returned object is allocated in backend-local memory.
> + * consumption.
>
> The existing comment slipped past my radar, but max_bytes is not a
> limit, it's a hint. Come to think of it, it never was a limit in the
> normal sense, but in earlier patches it was the criteria for reporting
> "I'm full" when asked.

Updated the comment.

>
> void
> TidStoreDestroy(TidStore *ts)
> {
> - /* Destroy underlying radix tree */
> if (TidStoreIsShared(ts))
> + {
> + /* Destroy underlying radix tree */
> shared_ts_free(ts->tree.shared);
> +
> + dsa_detach(ts->area);
> + }
> else
> local_ts_free(ts->tree.local);
>
> It's still destroyed in the local case, so not sure why this comment was moved?
>
> v78-0006:
>
> -#define PARALLEL_VACUUM_KEY_DEAD_ITEMS 2
> +/* 2 was PARALLEL_VACUUM_KEY_DEAD_ITEMS */
>
> I don't see any use in core outside this module -- maybe it's possible
> to renumber these?

Fixed the above points.

I've attached the latest patches. The 0004 and 0006 patches are
updates from the previous version.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v79-0006-Address-review-comments-on-vacuum-integration.patch application/octet-stream 7.8 KB
v79-0004-Address-review-comments-on-tidstore.patch application/octet-stream 3.0 KB
v79-0005-Use-TidStore-for-dead-tuple-TIDs-storage-during-.patch application/octet-stream 43.7 KB
v79-0002-Allow-specifying-initial-and-maximum-segment-siz.patch application/octet-stream 10.2 KB
v79-0003-Rethink-create-and-attach-APIs-of-shared-TidStor.patch application/octet-stream 10.1 KB
v79-0001-Fix-an-inconsistent-function-prototype-with-the-.patch application/octet-stream 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-03-25 13:14:34 Re: session username in default psql prompt?
Previous Message Robert Haas 2024-03-25 13:06:26 Re: session username in default psql prompt?