Re: Truncation failure in autovacuum results in data corruption (duplicate keys)

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Truncation failure in autovacuum results in data corruption (duplicate keys)
Date: 2018-04-19 04:47:20
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Apr 18, 2018 at 04:49:17PM -0400, Tom Lane wrote:
> Just to throw out a possibly-crazy idea: maybe we could fix this by
> PANIC'ing if truncation fails, so that we replay the row deletions from
> WAL. Obviously this would be intolerable if the case were frequent,
> but we've had only two such complaints in the last nine years, so maybe
> it's tolerable. It seems more attractive than taking a large performance
> hit on truncation speed in normal cases, anyway.

It can take some time to go through the whole thread...

And that was my first intuition when looking at those things.

So one case is where the truncation of the main relation happens first,
and succeeds. After that comes up potentially the truncation of index
pages which can refer to tuples on the pages which have been truncated
previously, and then that part fails. This causes index references to
be broken, which is what the report of 2010 is originally about.

> A gotcha to be concerned about is what happens if we replay from WAL,
> come to the XLOG_SMGR_TRUNCATE WAL record, and get the same truncation
> failure again, which is surely not unlikely. PANIC'ing again will not
> do. I think we could probably handle that by having the replay code
> path zero out all the pages it was unable to delete; as long as that
> succeeds, we can call it good and move on.

The complain we are discussing here is Windows antivirus meddling with
PostgreSQL by randomly preventing an access to the file to be
truncated. Would a PANIC in this unique code path be sufficient though?
It seems to me that any error could cause an inconsistency, which could
justify the use of a critical section instead to force WAL replay to
cleanup things?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-04-19 05:24:36 Corrupted btree index on HEAD because of covering indexes
Previous Message Tsunakawa, Takayuki 2018-04-19 04:46:21 RE: Built-in connection pooling