Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Antonin Houska <ah(at)cybertec(dot)at>
Cc: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2021-09-17 06:18:16
Message-ID: CAA4eK1+x1G34+N2a_jw65-z+y5EXNJ=SrUJ=oQn4GewqjWZg2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 9, 2021 at 8:33 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> The cfbot complained that the patch series no longer applies, so I've rebased
> it and also tried to make sure that the other flags become green.
>
> One particular problem was that pg_upgrade complained that "live undo data"
> remains in the old cluster. I found out that the temporary undo log causes the
> problem, so I've adjusted the query in check_for_undo_data() accordingly until
> the problem gets fixed properly.
>
> The problem of the temporary undo log is that it's loaded into local buffers
> and that backend can exit w/o flushing local buffers to disk, and thus we are
> not guaranteed to find enough information when trying to discard the undo log
> the backend wrote. I'm thinking about the following solutions:
>
> 1. Let the backend manage temporary undo log on its own (even the slot
> metadata would stay outside the shared memory, and in particular the
> insertion pointer could start from 1 for each session) and remove the
> segment files at the same moment the temporary relations are removed.
>
> However, by moving the temporary undo slots away from the shared memory,
> computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would
> be affected. It might seem that a transaction which only writes undo log
> for temporary relations does not need to affect oldestFullXidHavingUndo,
> but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo
> prevents transactions to be truncated from the CLOG too early, I wonder if
> the following is possible (This scenario is only applicable to the zheap
> storage engine [1], which is not included in this patch, but should already
> be considered.):
>
> A transaction creates a temporary table, does some (many) changes and then
> gets rolled back. The undo records are being applied and it takes some
> time. Since XID of the transaction did not affect oldestFullXidHavingUndo,
> the XID can disappear from the CLOG due to truncation.
>

By above do you mean to say that in zheap code, we don't consider XIDs
that operate on temp table/undo for oldestFullXidHavingUndo?

> However zundo.c in
> [1] indicates that the transaction status *is* checked during undo
> execution, so we might have a problem.
>

It would be easier to follow if you can tell which exact code are you
referring here?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-09-17 06:22:14 RE: Logical replication keepalive flood
Previous Message Bharath Rupireddy 2021-09-17 06:16:33 Re: improve pg_receivewal code