Re: BUG #17462: Invalid memory access in heapam_tuple_lock

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: anisimow(dot)d(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17462: Invalid memory access in heapam_tuple_lock
Date: 2022-04-11 15:55:19
Message-ID: 286748.1649692519@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> When running parallel queries using pgbench with valgrind-enabled server:
> pgbench -i -s 1
> pgbench -t 1000 -c 10 -j 10
> I get:
> ==00:00:03:09.642 456530== Invalid read of size 2

Reproduced here. It's surprising that nobody noticed this before,
because AFAICS the bug is pretty old: it dates to somebody foolishly
deciding that heap_fetch didn't need its keep_buf argument, which
evidently happened in v12 (didn't track down the exact commit yet).
As you say, valgrind would not have caught this problem before
1e0dfd166, but that's not so new anymore either.

In principle, this is showing an actual bug, because once we drop
the buffer pin somebody could replace the page before we get done
examining the tuple. I'm not sure what the odds are of that happening
in the field, but they're probably mighty low because a just-accessed
buffer should not be high priority for replacement.

My inclination for a fix is to revert the removal of the keep_buf argument
and go back to having heapam_tuple_lock and other callers release the
buffer after they are done. However, that's problematic in released
branches, because it seems likely that there are outside callers of
heap_fetch. Can we get away with only fixing this in HEAD?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2022-04-11 16:19:51 Re: BUG #17462: Invalid memory access in heapam_tuple_lock
Previous Message wangsh.fnst@fujitsu.com 2022-04-11 10:24:26 Re: "unexpected duplicate for tablespace" problem in logical replication