Re: POC: Cleaning up orphaned files using undo logs

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-01 07:53:26
Message-ID: CA+hUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 28, 2019 at 6:09 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I happened to open up 0001 from this series, which is from Thomas, and
> I do not think that the pg_buffercache changes are correct. The idea
> here is that the customer might install version 1.3 or any prior
> version on an old release, then upgrade to PostgreSQL 13. When they
> do, they will be running with the old SQL definitions and the new
> binaries. At that point, it sure looks to me like the code in
> pg_buffercache_pages.c is going to do the Wrong Thing. [...]

Yep, that was completely wrong. Here's a new version. I tested that
I can install 1.3 in an older release, then pg_upgrade to master, then
look at the view without the new column, then UPGRADE the extension to
1.4, and then the new column appears.

Other new stuff in this tarball (and also at
https://github.com/EnterpriseDB/zheap/tree/undo):

Based on hallway track discussions at PGCon, I have made a few
modifications to the undo log storage and record layer to support
"shared" record sets. They are groups of records can be used for
temporary storage space for anything that needs to outlive a whole set
of transactions. The intended usage is extra transaction slots for
updaters and lockers when there isn't enough space on a zheap (or
other AM) page. The idea is to avoid the need to have in-heap
overflow pages for transient transaction management data, and instead
put that stuff on the conveyor belt of perfectly timed doom[1] along
with old tuple versions.

"Shared" undo records are never executed (that is, they don't really
represent rollback actions), they are just used for storage space that
is eventually discarded. (I experimented with a way to use these also
to perform rollback actions to clean up stuff like the junk left
behind by aborted CREATE INDEX CONCURRENTLY commands, which seemed
promising, but it turned out to be quite tricky so I abandoned that
for now).

Details:

1. Renamed UndoPersistence to UndoLogCategory everywhere, and add a
fourth category UNDO_SHARED where transactions can write 'out of band'
data that relates to more than one transaction.

2. Introduced a new RMGR callback rm_undo_status. It is used to
decide when record sets in the UNDO_SHARED category should be
discarded (instead of the usual single xid-based rules). The possible
answers are "discard me now!", "ask me again when a given XID is all
visible", and "ask me again when a given XID is no longer running".

3. Recognise UNDO_SHARED record set boundaries differently. Whereas
undolog.c recognises transaction boundaries automatically for the
other categories (UNDO_PERMANENT, UNDO_UNLOGGED, UNDO_TEMP), for
UNDO_SHARED the

4. Add some quick-and-dirty throw-away test stuff to demonstrate
that. SELECT test_multixact([1234, 2345]) will create a new record
set that will survive until the given array of transactions is no
longer running, and then it'll be discarded. You can see that with
SELECT * FROM undoinspect('shared'). Or look at SELECT
pg_stat_undo_logs. This test simply writes all the xids into its
payload, and then has an rm_undo_status function that returns the
first xid it finds in the list that is still running, or if none are
running returns UNDO_STATUS_DISCARD.

Currently you can only return UNDO_STATUS_WAIT_XMIN so wait for an xid
to be older than the oldest xmin; presumably it'd be useful to be able
to discard as soon as an xid is no longer active, which could be a bit
sooner.

Another small change: several people commented that
UndoLogIsDiscarded(ptr) ought to have some kind of fast path that
doesn't acquire locks since it'll surely be hammered. Here's an
attempt at that that provides an inlined function that uses a
per-backend recent_discard to avoid doing more work in the (hopefully)
common case that you mostly encounter discarded undo pointers. I hope
this change will show up in profilers in some zheap workloads but this
hasn't been tested yet.

Another small change/review: the function UndoLogGetNextInsertPtr()
previously took a transaction ID, but I'm not sure if that made sense,
I need to think about it some more.

I pulled the latest patches pulled in from the "undoprocessing" branch
as of late last week, and most of the above is implemented as fixup
commits on top of that.

Next I'm working on DBA facilities for forcing undo records to be
discarded (which consists mostly of sorting out the interlocking to
make that work safely). And also testing facilities for simulating
undo log switching (when you fill up each log and move to another one,
which are rare code paths run, so we need a good way to make them not
rare).

[1] https://speakerdeck.com/macdice/transactions-in-postgresql-and-other-animals?slide=23

--
Thomas Munro
https://enterprisedb.com

Attachment Content-Type Size
undo-20190701.tgz application/x-gzip 186.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2019-07-01 08:29:17 Re: Superfluous libpq-be.h include in GSSAPI code
Previous Message Kyotaro Horiguchi 2019-07-01 07:02:59 Re: Protect syscache from bloating with negative cache entries