Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16329: Valgrind detects an invalid read when building a gist index with buffering
Date: 2020-11-02 16:55:14
Message-ID: CALT9ZEEJ1pqAnkcO4H7AWOBhLN0KJDa9A=pDn9NC7pbqkwEAeQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

пн, 2 нояб. 2020 г. в 20:05, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> >> Please look at the patch that modifies the gist regression test to make
> >> the issue visible and fixes it by avoiding access to the memory context
> >> that can be reset in gistProcessEmptyingQueue().
>
> > Could you please let me know if the fix is incorrect (or not elaborated
> > enough to be included in the upcoming releases) or this issue is false
> > positive?
>
> I took a look at this. I see the hazard, but I'm not sure that
> I like your proposed quick-fix. Essentially, the problem is that
> gistBuildCallback is supposing that the tuple it creates will survive
> the execution of its subroutines, which in fact it won't always.
> But that means that at some point the tuple pointer it's passed down to
> those subroutines becomes a dangling pointer. What guarantee would we
> have that the subroutines don't access the tuple after that point?
>
> I think the real issue here is confusion about which level of function
> "owns" the tempCxt and gets to decide when to reset it. We can't have
> both gistBuildCallback and gistProcessEmptyingQueue doing that. Maybe
> we need more than one temporary context, or maybe we could just skip
> the context reset in gistProcessEmptyingQueue and let the storage
> accumulate till we get back to gistBuildCallback. But in any case
> it's going to require more analysis.
>

I investigated the code and realized that there are two ways to call the
problematic gistProcessEmptyingQueue() function:

1. gistEmptyAllBuffers /tempCxt owner/ -> gistProcessEmptyingQueue
A potentially dangerous operation of resetting memory context inside
sub-function calls of a 'context owner' function is done at the very ending
of gistProcessEmptyingQueue(). Just after it we return back
to gistEmptyAllBuffers and immediately have a switch to old context. So
resetting tempCxt has no danger in this case as no pointers are used after.

2. gistBuildCallback /tempCxt owner/ -> gistBufferingBuildInsert ->
gistProcessEmptyingQueue
gistProcessEmptyingQueue() is at the very ending of
gistBufferingBuildInsert(). After the reset no pointers from tempCxt
context are used up to the 'owner' level of gistBuildCallback. So moving
the only itup pointer operation before gistBufferingBuildInsert() call will
fix the bug.

(Of course, the alternative of not re-setting tempCxt
inside gistProcessEmptyingQueue() and doing this level up will also fix
the issue)

So I think that the patch by Alexander will do the thing. I also added some
comments to the code and removed extra context reset in gistBuildCallback
(which is already done level down) to make things clear.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com <http://www.postgrespro.com>

Attachment Content-Type Size
v2-0001-Fix-for-GiST-buffered-build-bug-related-to-contex.patch application/octet-stream 2.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-11-02 21:01:00 BUG #16695: pg_hba_file_rules NULL address and netmask
Previous Message David G. Johnston 2020-11-02 16:22:00 Re: Calling variadic function with default value in named notation