Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering
Date: 2020-05-05 18:59:17
Message-ID: 10261.1588705157@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
>> Please look at the patch that modifies the gist regression test to make
>> the issue visible and fixes it by avoiding access to the memory context
>> that can be reset in gistProcessEmptyingQueue().

> Could you please let me know if the fix is incorrect (or not elaborated
> enough to be included in the upcoming releases) or this issue is false
> positive?

I took a look at this. I see the hazard, but I'm not sure that
I like your proposed quick-fix. Essentially, the problem is that
gistBuildCallback is supposing that the tuple it creates will survive
the execution of its subroutines, which in fact it won't always.
But that means that at some point the tuple pointer it's passed down to
those subroutines becomes a dangling pointer. What guarantee would we
have that the subroutines don't access the tuple after that point?

I think the real issue here is confusion about which level of function
"owns" the tempCxt and gets to decide when to reset it. We can't have
both gistBuildCallback and gistProcessEmptyingQueue doing that. Maybe
we need more than one temporary context, or maybe we could just skip
the context reset in gistProcessEmptyingQueue and let the storage
accumulate till we get back to gistBuildCallback. But in any case
it's going to require more analysis.

Or, if we do just hack it as you suggest, there had better be a couple
of fat comments in gistBufferingBuildInsert warning about the hazards.

I was somewhat dismayed to realize from looking at the code coverage
report that we have exactly zero test coverage of the gist buffering
build code paths today. That's just awful; how did the feature get
committed with no test coverage? Your proposed test additions raise the
coverage in gistbuild.c and gistbuildbuffers.c to something respectable,
at least. But it looks like there are still some potentially-delicate
paths that aren't tested, notably the "have to stop buffering" business.
Can we do better at reasonable cost, or are those paths just never
reached without huge data volume? (Could we make them more reachable
by turning down maintenance_work_mem or some other setting?)

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andreas Seltenreich 2020-05-05 20:44:52 Re: BUG #16393: PANIC: cannot abort transaction, it was already committed
Previous Message Daniel Gustafsson 2020-05-05 08:37:50 Re: BUG #16416: unable to start the server with pg_CTL