Re: New strategies for freezing, advancing relfrozenxid early

From: Nikita Malakhov <hukutoc(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: New strategies for freezing, advancing relfrozenxid early
Date: 2022-12-16 07:59:39
Message-ID: CAN-LCVNP5P=0VhnkLwty+xntPB2AOtN3jgzZKgdagJmfY29ScA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I've found this discussion very interesting, in view of vacuuming
TOAST tables is always a problem because these tables tend to
bloat very quickly with dead data - just to remind, all TOAST-able
columns of the relation use the same TOAST table which is one
for the relation, and TOASTed data are not updated - there are
only insert and delete operations.

Have you tested it with large and constantly used TOAST tables?
How would it work with the current TOAST implementation?

We propose a different approach to the TOAST mechanics [1],
and a new vacuum would be very promising.

Thank you!

[1] https://commitfest.postgresql.org/41/3490/

On Fri, Dec 16, 2022 at 10:48 AM John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
wrote:

>
> On Wed, Dec 14, 2022 at 6:07 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >
> > At the suggestion of Jeff, I wrote a Wiki page that shows motivating
> > examples for the patch series:
> >
> >
> https://wiki.postgresql.org/wiki/Freezing/skipping_strategies_patch:_motivating_examples
> >
> > These are all cases where VACUUM currently doesn't do the right thing
> > around freezing, in a way that is greatly ameliorated by the patch.
> > Perhaps this will help other hackers to understand the motivation
> > behind some of these mechanisms. There are plenty of details that only
> > make sense in the context of a certain kind of table, with certain
> > performance characteristics that the design is sensitive to, and seeks
> > to take advantage of in one way or another.
>
> Thanks for this. This is the kind of concrete, data-based evidence that I
> find much more convincing, or at least easy to reason about. I'd actually
> recommend in the future to open discussion with this kind of analysis --
> even before coding, it's possible to indicate what a design is *intended*
> to achieve. And reviewers can likewise bring up cases of their own in a
> concrete fashion.
>
> On Wed, Dec 14, 2022 at 12:16 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> > At the very least, a given VACUUM operation has to choose its freezing
> > strategy based on how it expects the table will look when it's done
> > vacuuming the table, and how that will impact the next VACUUM against
> > the same table. Without that, then vacuuming an append-only table will
> > fall into a pattern of setting pages all-visible in one vacuum, and
> > then freezing those same pages all-frozen in the very next vacuum
> > because there are too many. Which makes little sense; we're far better
> > off freezing the pages at the earliest opportunity instead.
>
> That makes sense, but I wonder if we can actually be more specific: One
> motivating example mentioned is the append-only table. If we detected that
> case, which I assume we can because autovacuum_vacuum_insert_* GUCs exist,
> we could use that information as one way to drive eager freezing
> independently of size. At least in theory -- it's very possible size will
> be a necessary part of the decision, but it's less clear that it's as
> useful as a user-tunable knob.
>
> If we then ignored the append-only case when evaluating a freezing policy,
> maybe other ideas will fall out. I don't have a well-thought out idea about
> policy or knobs, but it's worth thinking about.
>
> Aside from that, I've only given the patches a brief reading. Having seen
> the VM snapshot in practice (under "Scanned pages, visibility map snapshot"
> in the wiki page), it's neat to see fewer pages being scanned. Prefetching
> not only seems superior to SKIP_PAGES_THRESHOLD, but anticipates
> asynchronous IO. Keeping only one VM snapshot page in memory makes perfect
> sense.
>
> I do have a cosmetic, but broad-reaching, nitpick about terms regarding
> "skipping strategy". That's phrased as a kind of negative -- what we're
> *not* doing. Many times I had to pause and compute in my head what we're
> *doing*, i.e. the "scanning strategy". For example, I wonder if the VM
> strategies would be easier to read as:
>
> VMSNAP_SKIP_ALL_VISIBLE -> VMSNAP_SCAN_LAZY
> VMSNAP_SKIP_ALL_FROZEN -> VMSNAP_SCAN_EAGER
> VMSNAP_SKIP_NONE -> VMSNAP_SCAN_ALL
>
> Notice here they're listed in order of increasing eagerness.
>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>

--
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-12-16 08:05:24 Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
Previous Message John Naylor 2022-12-16 07:48:17 Re: New strategies for freezing, advancing relfrozenxid early