Re: HOT chain validation in verify_heapam()

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Subject: Re: HOT chain validation in verify_heapam()
Date: 2023-03-23 20:36:56
Message-ID: 20230323203656.le7thulot4zrzi6v@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-03-23 15:37:15 -0400, Robert Haas wrote:
> On Wed, Mar 22, 2023 at 8:38 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > skink / valgrind reported in a while back and found another issue:
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2023-03-22%2021%3A53%3A41
> >
> > ==2490364== VALGRINDERROR-BEGIN
> > ==2490364== Conditional jump or move depends on uninitialised value(s)
> > ==2490364== at 0x11D459F2: check_tuple_visibility (verify_heapam.c:1379)
> ...
> > ==2490364== Uninitialised value was created by a stack allocation
> > ==2490364== at 0x11D45325: check_tuple_visibility (verify_heapam.c:994)
>
> OK, so this is an interesting one. It's complaining about switch
> (xmax_status), because the get_xid_status(xmax, ctx, &xmax_status)
> used in the previous switch might not actually initialize xmax_status,
> and apparently didn't in this case. get_xid_status() does not set
> xmax_status except when it returns XID_BOUNDS_OK, and the previous
> switch falls through both in that case and also when get_xid_status()
> returns XID_INVALID. That seems like it must be the issue here. As far
> as I can see, this isn't related to any of the recent changes but has
> been like this since this code was introduced, so I'm a little
> confused about why it's only causing a problem now.

Could it be that the tests didn't exercise the path before?

> Nonetheless, here's a patch. I notice that there's a similar problem
> in another place, too. get_xid_status() is called a total of five
> times and it looks like only three of them got it right. I suppose
> that if this is correct we should back-patch it.

Yea, I think you're right.

> + report_corruption(ctx,
> + pstrdup("xmin is invalid"));

Not a correctnes issue: Nearly all callers to report_corruption() do a
psprintf(), the remaining a pstrdup(), as here. Seems like it'd be cleaner to
just make report_corruption() accept a format string?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2023-03-23 20:41:39 Re: Commitfest 2023-03 starting tomorrow!
Previous Message Tom Lane 2023-03-23 20:35:46 Re: Progress report of CREATE INDEX for nested partitioned tables