Re: Truncation failure in autovacuum results in data corruption (duplicate keys)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: maumau307(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Truncation failure in autovacuum results in data corruption (duplicate keys)
Date: 2018-08-20 15:45:12
Message-ID: 19795.1534779912@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> writes:
> On Wed, Apr 18, 2018 at 10:04 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It's hard to see a way around this that isn't fairly catastrophic for
>> performance :-(. But in any case it's wrapped up in order-of-operations
>> issues. I've long since forgotten the details, but I seem to have thought
>> that there were additional order-of-operations hazards besides this one.

> Just for clarification. Do you mean zeroing of to-be-truncated blocks
> to be catastrophic for performance? Or something else?

It would be pretty terrible to have to do that in the normal code path.
The other idea that was in my mind was to force out dirty buffers, then
discard them, then truncate ... but that's awful too if there's a lot
of dirty buffers that we'd have to write only to throw the data away.

I think it's all right to be slow if the truncation fails, though; that
does not seem like a path that has to be fast, only correct.

One thing to be concerned about is that as soon as we've discarded any
page images from buffers, we have to be in a critical section all the way
through till we've either truncated or zeroed those pages on-disk. Any
failure in that has to result in a PANIC and recover-from-WAL, because
we don't know what state we lost by dropping the buffers. Ugh. It's
especially bad if the truncation fails because the file got marked
read-only, because then the zeroing is also going to fail, making that
a guaranteed PANIC case (with no clear path to recovery after the
panic, either ...)

I wonder if it could help to do something like zeroing the buffers in
memory, then truncating, then discarding buffers. This is just a
half-baked idea and I don't have time to think more right now, but maybe
making two passes over the shared buffers could lead to a better solution.
It's the useless I/O that we need to avoid, IMO.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Travers 2018-08-20 16:46:32 Re: Two proposed modifications to the PostgreSQL FDW
Previous Message Alexander Korotkov 2018-08-20 15:04:31 Re: Truncation failure in autovacuum results in data corruption (duplicate keys)