Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date: 2017-12-07 20:22:43
Message-ID: 20171207202243.pmxwjm5wvz5lp6j6@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse. It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required. The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
>
> Looking at 0002: I agree with the stuff being done here. I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case. I didn't catch any place missing additional checks.

I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.

> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about. No translation required,
> though. Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

Hm, I don't really care one way or another. I do see that you used
errmsg() in some places, errmsg_internal() in others. Was that
intentional?

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation. The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that. (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

That seems like a good idea. There's some cases where that could
increase log spam noticeably (unitialized blocks), but that seems
acceptable.

> +static void
> +lazy_scan_heap_cb(void *arg)
> +{
> + LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg;
> +
> + if (info->blkno != InvalidBlockNumber)
> + errcontext("while scanning page %u of relation %s",
> + info->blkno, RelationGetRelationName(info->relation));
> + else
> + errcontext("while vacuuming relation %s",
> + RelationGetRelationName(info->relation));
> +}

Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
replacing scanning with vacuuming?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2017-12-07 20:41:28 Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Previous Message Andres Freund 2017-12-07 20:17:06 Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-12-07 20:39:36 Re: Logical replication without a Primary Key
Previous Message Pavel Stehule 2017-12-07 20:21:22 Re: plpgsql test layout