Re: Truncation failure in autovacuum results in data corruption (duplicate keys)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "MauMau" <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Truncation failure in autovacuum results in data corruption (duplicate keys)
Date: 2018-04-18 19:03:57
Message-ID: 10264.1524078237@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"MauMau" <maumau307(at)gmail(dot)com> writes:
> However, I have a question. How does the truncation failure in
> autovacuum lead to duplicate keys? The failed-to-be-truncated pages
> should only contain dead tuples, so pg_dump's table scan should ignore
> dead tuples in those pages.

[ re-reads thread... ] The extra assumption you need in order to have
trouble is that the blocks in question are dirty in shared buffers and
have never been written to disk since their rows were deleted. Then
the situation is that the page image on disk shows the rows as live,
while the up-to-date page image in memory correctly shows them as dead.
Relation truncation throws away the page image in memory without ever
writing it to disk. Then, if the subsequent file truncate step fails,
we have a problem, because anyone who goes looking for that page will
fetch it afresh from disk and see the tuples as live.

There are WAL entries recording the row deletions, but that doesn't
help unless we crash and replay the WAL.

It's hard to see a way around this that isn't fairly catastrophic for
performance :-(. But in any case it's wrapped up in order-of-operations
issues. I've long since forgotten the details, but I seem to have thought
that there were additional order-of-operations hazards besides this one.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Christophe Pettus 2018-04-18 19:10:47 Re: Proposal: Adding json logging
Previous Message Robert Haas 2018-04-18 18:59:26 Re: Proposal: Adding json logging