From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Partitionwise join fails under GEQO |
Date: | 2020-10-05 01:17:33 |
Message-ID: | 1683100.1601860653@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
If you run the core regression tests with geqo_threshold = 2
(injected, say, via ALTER SYSTEM SET), you will notice the
earlier tests showing some join ordering differences, which
is unsurprising ... but when it gets to partition_join, that
test just dumps core.
Investigation shows that the crash is due to a corrupt EquivalenceClass
member. It gets that way because add_child_join_rel_equivalences
is unaware of the planner's memory allocation policies, and feels
free to mess with the EC data structures during join planning.
When GEQO's transient memory context is thrown away after one round
of GEQO planning, some still-referenced EC data goes with it,
resulting in a crash during the next round.
I band-aided around that with the attached patch, which is enough
to prevent the crash, but it's entirely unsatisfactory as a production
solution. The problem is that each GEQO round will re-add the same
child EC members, since add_child_join_rel_equivalences has absolutely
no concept that the members it wants might be there already. Since
GEQO tends to use a lot of rounds, this can run to significant memory
bloat, not to mention a pretty serious speed hit while EC operations
thumb through all of those duplicate expressions.
Now that I've seen this, I wonder whether add_child_join_rel_equivalences
might not be making duplicate EC entries even without GEQO. Is there any
guarantee that it's not called repeatedly on related join-rel sets?
So I think that in addition to this, we need some check to see if the
derived EC child member is already there. It's likely that checking
for an em_relids match would be sufficient to eliminate irrelevant
members quickly.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
bandaid-for-geqo-crash.patch | text/x-diff | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-10-05 01:18:24 | Re: A modest proposal: let's add PID to assertion failure messages |
Previous Message | Michael Paquier | 2020-10-05 01:05:54 | Re: Add header support to text format and matching feature |