Re: More aggressive vacuuming of temporary tables

From: Andres Freund <andres(at)anarazel(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: More aggressive vacuuming of temporary tables
Date: 2020-09-09 18:02:05
Message-ID: 20200909180205.k3pff6kmaim6hmvb@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-09-09 10:14:04 -0400, Stephen Frost wrote:
> > I've been toying with a patch that introduces more smarts about when a
> > row is removable, by looking more closely whether a specific row
> > versions are visible (e.g. in the common case of one old snapshot and
> > lots of newer rows). But that's orders of magnitude more complicated. So
> > going for something as simple as this seems like a good idea.
>
> I've wondered about this for a long time- very cool that you've found
> time to actually work on a patch. A couple of different ideas were
> discussed previously about how to do that kind of a check- mind talking
> about what method you're using, or perhaps just sharing that patch? :)

It's very very early, and it doesn't really work. I basically tried to
just plug a bit more intelligence into the dead tuple detection (which
now can easily store more and incrementally built state with the recent
changes for snapshot scalability). Detection that tuples newer than the
horizon are dead isn't that complicated - what's hard is not breaking
things due to ctid chains lacking intermediate versions. To avoid that
I had to restrict it to inserted (not updated) tuples that were
subsequently deleted. And my heuristic only supported only one old
snapshot.

Building a bsearchable list of ranges of valid (xmin-xmax] ranges isn't
that hard. Some care needs to be taken to make the list non-overlapping,
but that's easy enough by just merging entries.

Obviously lookup in such a more complicated structure isn't free. Nor is
building it. So we'd need some heuristics about when to do so. It'd
probably be OK to occasionally look at the age of the oldest in-progress
statement, to infer the time for old snapshots based on that. And then
we could have a GUC that says when it's worth doing more complicated
lookups.

I don't have a handle on how to deal with the ctid chaining for
intermediate row versions.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2020-09-09 18:06:20 Re: WIP: BRIN multi-range indexes
Previous Message Fujii Masao 2020-09-09 17:29:23 Re: Global snapshots