Re: Broken resetting of subplan hash tables

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: Broken resetting of subplan hash tables
Date: 2020-02-29 19:35:53
Message-ID: 20200229193553.pxxcfyydhgs6wobh@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-02-29 14:02:59 -0500, Tom Lane wrote:
> > 3. Additionally since the memory context used by the hash tables is
> > reset in buildSubPlanHash() if we start resetting hash tables we will
> > get a use after free bug.
>
> Nope, not right. The hash table metadata is now allocated in the
> es_query_cxt; what is in the hashtablecxt is just tuples, and that
> does need to be cleared, per the comment for ResetTupleHashTable.
> Your patch as given results in a nasty memory leak, which is easily
> demonstrated with a small mod to the regression test case I added:

> select sum(ss.tst::int) from
> generate_series(1,10000000) o cross join lateral (
> select i.ten in (select f1 from int4_tbl where f1 <= o) as tst,
> random() as r
> from onek i where i.unique1 = o%1000 ) ss;
>
> > Since the current behavior of the code in HEAD is not actually broken,
> > it is just an optimization which is not used, this fix does not have to
> > be backpatched.
>
> Unfortunately ... this test case also leaks memory like mad in
> HEAD, v12, and v11, because all of them are rebuilding the hash
> table from scratch without clearing the old one. So this is
> indeed broken and a back-patch is necessary.

Yea :(. Thanks for doing that.

> I noted while looking at this that most of the calls of
> ResetTupleHashTable are actually never reached in the regression
> tests (per code coverage report) so I made up some test cases
> that do reach 'em, and included those in the commit.

Cool.

> TBH, I think that this tuple table API is seriously misdesigned;
> it is confusing and very error-prone to have the callers need to
> reset the tuple context separately from calling ResetTupleHashTable.

Do you have an alternative proposal? Before committing the patch adding
it that way, I'd waited for quite a while asking for input... In
several cases (nodeAgg.c, nodeSetOp.c) there's memory from outside
execGrouping.c that's also allocated in the same context as the table
contents (via TupleHashTable->additional) - just resetting the context
passed to BuildTupleHashTableExt() as part of ResetTupleHashTable()
seems problematic too.

We could change it so more of the metadata for execGrouping.c is
computed outside of BuildTupleHashTableExt(), and continue to destroy
the entire hashtable. But we'd still have to reallocate the hashtable,
the slot, etc. So having a reset interface seems like the right thing.

I guess we could set it up so that BuildTupleHashTableExt() registers a
memory context reset callback on tablecxt, which'd reinitialize the
hashtable. But that seems like it'd be at least as confusing?

> Also, the callers all look like their resets are intended to destroy
> the whole hashtable not just its contents (as indeed they were doing,
> before the faulty commit). But I didn't attempt to fix that today.

Hm? nodeAgg.c, nodeSetOp.c, nodeRecursiveUnion.c don't at all look like
that to me? Why would they want to drop the hashtable metadata when
resetting? What am I missing?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-02-29 19:59:42 Re: vacuum verbose detail logs are unclear; log at *start* of each stage
Previous Message Chris Bandy 2020-02-29 19:33:48 [PATCH] Add schema and table names to partition error