From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | maumau307(at)gmail(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Truncation failure in autovacuum results in data corruption (duplicate keys) |
Date: | 2018-08-20 15:00:09 |
Message-ID: | CAPpHfdvqWECmi6SWt8K3p16GtObpRgyAGuKzan4w2HGRoFiK=Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 18, 2018 at 11:49 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
> > Relation truncation throws away the page image in memory without ever
> > writing it to disk. Then, if the subsequent file truncate step fails,
> > we have a problem, because anyone who goes looking for that page will
> > fetch it afresh from disk and see the tuples as live.
>
> > There are WAL entries recording the row deletions, but that doesn't
> > help unless we crash and replay the WAL.
>
> > It's hard to see a way around this that isn't fairly catastrophic for
> > performance :-(.
>
> Just to throw out a possibly-crazy idea: maybe we could fix this by
> PANIC'ing if truncation fails, so that we replay the row deletions from
> WAL. Obviously this would be intolerable if the case were frequent,
> but we've had only two such complaints in the last nine years, so maybe
> it's tolerable. It seems more attractive than taking a large performance
> hit on truncation speed in normal cases, anyway.
We have only two complaints of data corruption in nine years. But I
suspect that in vast majority of cases truncation error didn't cause
the corruption OR the corruption wasn't noticed. So, once we
introduce PANIC here, we would get way more complaints.
> A gotcha to be concerned about is what happens if we replay from WAL,
> come to the XLOG_SMGR_TRUNCATE WAL record, and get the same truncation
> failure again, which is surely not unlikely. PANIC'ing again will not
> do. I think we could probably handle that by having the replay code
> path zero out all the pages it was unable to delete; as long as that
> succeeds, we can call it good and move on.
>
> Or maybe just do that in the mainline case too? That is, if ftruncate
> fails, handle it by zeroing the undeletable pages and pressing on?
I've just started really digging into this set of problems. But this
idea looks good for me so soon...
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Konstantin Knizhnik | 2018-08-20 15:00:39 | Re: libpq compression |
Previous Message | Stephen Frost | 2018-08-20 14:56:39 | Re: Two proposed modifications to the PostgreSQL FDW |