Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-12-06 20:23:55
Message-ID: 20171206202355.byfh6g3bezy4rrcr@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Andres Freund wrote:

> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse. It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required. The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.

Looking at 0002: I agree with the stuff being done here. I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.

Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.

I started thinking it'd be good to report block number whenever anything
happened while scanning the relation. The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that. (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)

It appears that you got all the places that seem to reasonably need
additional checks.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
0001-tweaks-for-0002.patch text/plain 6.5 KB
0002-blkno-rel-context-info-for-lazy_scan_heap.patch text/plain 3.5 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Robert Haas 2017-12-06 20:28:12 Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i
Previous Message Bossart, Nathan 2017-12-06 19:23:57 Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i

Browse pgsql-hackers by date

  From Date Subject
Next Message Walter Cai 2017-12-06 20:26:26 Accessing base table relational names via RelOptInfo
Previous Message Bossart, Nathan 2017-12-06 19:23:57 Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i