Re: POC: Cleaning up orphaned files using undo logs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-06-18 18:07:17
Message-ID: CA+TgmoYHBkm7M8tNk6Z9G_aEOiw3Bjdux7v9+UzmdNTdFmFzjA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> [ new patches ]

I tried writing some code that throws an error from an undo log
handler and the results were not good. It appears that the code will
retry in a tight loop:

2019-06-18 13:58:53.262 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.263 EDT [42803] ERROR: robert_undo
2019-06-18 13:58:53.264 EDT [42803] ERROR: robert_undo

It seems clear that the error-handling aspect of this patch has not
been given enough thought. It's debatable what strategy should be
used when undo fails, but retrying 40 times per millisecond isn't the
right answer. I assume we want some kind of cool-down between retries.
10 seconds? A minute? Some kind of back-off algorithm that gradually
increases the retry time up to some maximum? Should there be one or
more GUCs?

Another thing that is not very nice is that when I tried to shut down
the server via 'pg_ctl stop' while the above was happening, it did not
shut down. I had to use an immediate shutdown. That's clearly not
OK.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleksii Kliukin 2019-06-18 18:13:49 Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock
Previous Message Chapman Flack 2019-06-18 17:26:31 Re: Avoiding possible future conformance headaches in JSON work