Re: The Free Space Map: Problems and Opportunities

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Hannu Krosing <hannuk(at)google(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jan Wieck <jan(at)wi3ck(dot)info>, Gregory Smith <gregsmithpgsql(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: The Free Space Map: Problems and Opportunities
Date: 2021-09-07 00:29:24
Message-ID: CAH2-Wz=H0=BFjjFFm0ee48VyNoATpQ-KQw4=6-Rw5w9aQh3iTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 6, 2021 at 4:33 PM Hannu Krosing <hannuk(at)google(dot)com> wrote:
> When I have been thinking of this type of problem it seems that the
> latest -- and correct :) -- place which should do all kinds of
> cleanup like removing aborted tuples, freezing committed tuples and
> setting any needed hint bits would be background writer or CHECKPOINT.
>
> This would be more PostgreSQL-like, as it moves any work not
> immediately needed from the critical path, as an extension of how MVCC
> for PostgreSQL works in general.

I think it depends. There is no need to do work in the background
here, with TPC-C. With my patch series each backend can know that it
just had an aborted transaction that affected a page that it more or
less still owns. And has very close at hand, for further inserts. It's
very easy to piggy-back the work once you have that sense of ownership
of newly allocated heap pages by individual backends/transactions.

> This would be more PostgreSQL-like, as it moves any work not
> immediately needed from the critical path, as an extension of how MVCC
> for PostgreSQL works in general.

I think that it also makes sense to have what I've called "eager
physical rollback" that runs in the background, as you suggest.

I'm thinking of a specialized form of VACUUM that targets a specific
aborted transaction's known-dirtied pages. That's my long term goal,
actually. Originally I wanted to do this as a way of getting rid of
SLRUs and tuple freezing, by representing that all heap pages must
only have committed tuples implicitly. That seemed like a good enough
reason to split VACUUM into specialized "eager physical rollback
following abort" and "garbage collection" variants.

The insight that making abort-related cleanup special will help free
space management is totally new to me -- it emerged from working
directly on this benchmark. But it nicely complements some of my
existing ideas about improving VACUUM.

> But doing it as part of checkpoint probably ends up with less WAL
> writes in the end.

I don't think that checkpoints are special in any way. They're very
important in determining the total number of FPIs we'll generate, and
so have huge importance today. But that seems accidental to me.

> There could be a possibility to do a small amount of cleanup -- enough
> for TPC-C-like workloads, but not larger ones -- while waiting for the
> next command to arrive from the client over the network. This of
> course assumes that we will not improve our feeder mechanism to have
> back-to-back incoming commands, which can already be done today, but
> which I have seen seldom used.

That's what I meant, really. Doing the work of cleaning up a heap page
that a transaction inserts into (say pruning away aborted tuples or
setting hint bits) should ideally happen right after commit or abort
-- at least for OLTP like workloads, which are the common case for
Postgres. This cleanup doesn't have to be done by exactly the same
transactions (and can't be in most interesting cases). It should be
quite possible for the work to be done by approximately the same
transaction, though -- the current transaction cleans up inserts made
by the previous (now committed/aborted) transaction in the same
backend (for the same table).

The work of setting hint bits and pruning-away aborted heap tuples has
to be treated as a logical part of the cost of inserting heap tuples
-- backends pay this cost directly. At least with workloads where
transactions naturally only insert a handful of rows each in almost
all cases -- very much the common case.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-09-07 00:29:49 Re: Column Filtering in Logical Replication
Previous Message Alvaro Herrera 2021-09-07 00:28:53 Re: Correct handling of blank/commented lines in PSQL interactive-mode history