Re: Sketch of a fix for that truncation data corruption issue

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Sketch of a fix for that truncation data corruption issue
Date: 2018-12-11 05:28:11
Message-ID: CA+TgmoYEO8xEio7YVU1rwfzU3O2id6NhzO_gRpcvAPfLyXeHig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 11, 2018 at 5:39 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> We got another report today [1] that seems to be due to the problem
> we've seen before with failed vacuum truncations leaving corrupt state
> on-disk [2]. Reflecting on that some more, it seems to me that we're
> never going to get to a solution that everybody finds acceptable without
> some rather significant restructuring at the buffer-access level.
> Since looking for a back-patchable solution has yielded no progress in
> eight years, what if we just accept that we will only fix this in HEAD,
> and think outside the box about how we could fix it if we're willing
> to change internal APIs as much as necessary?

+1.

> 9. If actual truncation boundary was different from plan, issue another
> WAL record saying "oh, we only managed to truncate to here, not there".

I don't entirely understand how this fix addresses the problems in
this area, but this step sounds particularly scary. Nothing
guarantees that the second WAL record ever gets replayed.

> * "Only managed to truncate to here" record: write out empty heap
> pages to fill the space from original truncation target to actual.
> This restores the on-disk situation to be equivalent to what it
> was in master, assuming all the dirty pages eventually got written.

This is equivalent only in a fairly loose sense, right?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-12-11 05:28:28 Re: [HACKERS] Bug when dumping "empty" operator classes
Previous Message Michael Paquier 2018-12-11 05:22:49 Re: allow online change primary_conninfo