Re: BUG #15847: Running out of memory when planning full outer joins involving many partitions

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: feikesteenbergen(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15847: Running out of memory when planning full outer joins involving many partitions
Date: 2019-06-13 17:14:16
Message-ID: 8168.1560446056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> We've had a few reports recently that had a backend consume a lot of
> memory causing either an OOM-kill or kubernetes rescheduling their
> PostgreSQL pod.
> For this specific bug report there were two things that clearly stood out:
> - a FULL OUTER JOIN is done
> - many partitions (thousands) are involved

I poked into this and found the cause. For the sample query, we have
an EquivalenceClass containing the expression
COALESCE(COALESCE(Var_1_1, Var_2_1), Var_3_1)
where each of the Vars belongs to an appendrel parent.
add_child_rel_equivalences() needs to add expressions representing the
transform of that to each child relation. That is, if the children
of table 1 are A1 and A2, of table 2 are B1 and B2, and of table 3
are C1 and C2, what we'd like to add are the expressions
COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_3_1)
COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_3_1)
COALESCE(COALESCE(Var_1_1, Var_B1_1), Var_3_1)
COALESCE(COALESCE(Var_1_1, Var_B2_1), Var_3_1)
COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C1_1)
COALESCE(COALESCE(Var_1_1, Var_2_1), Var_C2_1)
However, what it's actually producing is additional combinations for
each appendrel after the first, because each call also mutates the
previously-added child expressions. So in this example we also get
COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_3_1)
COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_3_1)
COALESCE(COALESCE(Var_A1_1, Var_2_1), Var_C1_1)
COALESCE(COALESCE(Var_A2_1, Var_2_1), Var_C2_1)
COALESCE(COALESCE(Var_A1_1, Var_B1_1), Var_C1_1)
COALESCE(COALESCE(Var_A2_1, Var_B2_1), Var_C2_1)
With two appendrels involved, that's O(N^2) expressions; with
three appendrels, more like O(N^3).

This is by no means specific to FULL JOINs; you could get the same
behavior with join clauses like "WHERE t1.a + t2.b + t3.c = t4.d".

These extra expressions don't have any use, since we're not
going to join the children directly to each other. So we need
to fix add_child_rel_equivalences() to not do that. The
simplest way seems to be to make it ignore em_is_child EC members,
which requires that we use the adjust_appendrel_attrs_multilevel
machinery if we're trying to convert one of the original expressions
for a grandchild relation. As attached. This patch fixes the
described performance problem and still passes check-world.

While I don't have any hesitation about pushing this patch into HEAD,
I do feel a bit nervous about back-patching it, particularly right
before a set of minor releases. I don't think that we consider large
numbers of partitions to be a well-supported case in v11 or before,
so for the released branches I'd rather just say "if it hurts, don't
do that".

As an aside, adjust_appendrel_attrs_multilevel() makes me positively ill
(and it's not the head cold I have today...). It's unbelievably
brute-force, which might be okay if it were something we'd execute only
once per query, but examples like this can require it to be executed
thousands of times. Still, right now is probably not a good time to blow
it up and rewrite it.

regards, tom lane

Attachment Content-Type Size
avoid-extra-derived-expressions-in-eclasses.patch text/x-diff 3.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-06-13 19:35:21 BUG #15851: Concurrent Refresh of Materialized views not preserving the order of the underlying query
Previous Message Tom Lane 2019-06-13 14:07:15 Re: BUG #15844: MIPS: remove .set mips2 in s_lock.h to fix r6 build