Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-30 02:45:09
Message-ID: CAA4eK1KhcJi22uu2dg6yXw8-THBVJaWHpSqPWXp-ija4NCF_pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 30, 2019 at 12:18 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, Jul 23, 2019 at 10:42 AM Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>
> > Yes, I also think that the function would error out only because of
> > can't-happen cases, like "too many locks taken" or "out of binary heap
> > slots" or "out of memory" (this last one is not such a can't happen
> > case). These cases happen probably due to some bugs, I suppose. But I
> > was wondering : Generally when the code errors out with such
> > can't-happen elog() calls, worst thing that happens is that the
> > transaction gets aborted. Whereas, in this case, the worst thing that
> > could happen is : the undo action would never get executed, which
> > means selects for this tuple will keep on accessing the undo log ?
> > This does not sound like any data consistency issue, so we should be
> > fine after all ?
>
> I don't think so. Every XID present in undo has to be something we
> can look up in CLOG to figure out which transactions are aborted and
> which transactions are committed, so that we know which transactions
> need undo. If we forget to undo the transaction, we can't discard it,
> which means we can't advance the CLOG transaction horizon, which means
> we'll eventually start failing to assign XIDs, leading to a refusal of
> all write transactions. Oops.
>
> More generally, it's not OK for the generic undo layer to make
> assumptions about whether the operations performed by the undo
> handlers are essential or not. We don't want to impose a design
> constraint the undo can only be used for things that are not actually
> critical, because that will make it hard to write AMs that use it.
> And there's no reason to live with such a design constraint anyway,
> because, as noted above, CLOG truncation requires it.
>
> More generally still, some can't-happen situations should be checked
> via Assert() and others via elog(). For example, consider some code
> that looks up a syscache tuple and pulls data from the returned tuple.
> If the code that handles DDL is written in such a way that the tuple
> should always exist, then this is a can't-happen situation, but
> generally the code checks this via elog(), not Assert(), because it
> could also happen due to the catalog contents being corrupted. If
> Assert() were used, the checks would not run in production builds, and
> a corrupt catalog would lead to a seg fault. An elog() is much
> friendlier. As a general principle, when a certain thing ought to
> always be true, but it being true depends on a whole lot of
> assumptions elsewhere in the code, and especially if it also depends
> on assumptions like "the database is not corrupted," I think elog() is
> preferable. Assert() is better for things that are more localized and
> that really can't go wrong for any reason other than a bug. In this
> case, I think I would tend towards elog(PANIC), but it's arguable.
>

Agreed, elog(PANIC) seems like a better way for this as compared to Assert.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2019-07-30 03:12:06 Re: Is ParsePrepareRecord dead function
Previous Message Michael Paquier 2019-07-30 02:31:05 Re: TopoSort() fix