Re: Buglets in equivclass.c

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Buglets in equivclass.c
Date: 2020-10-04 21:11:04
Message-ID: CAApHDvonTzJBux+kH2k=SqatYkh8q1b5kUkOu6zKf8bwKxNpQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 5 Oct 2020 at 06:29, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> While hacking on a patch that touches equivclass.c, I came across
> a couple of things that seemed wrong, and are fixed by the attached
> proposed patch.
>
> First, get_eclass_for_sort_expr() computes expr_relids and nullable_relids
> too soon. This is a waste of a not-really-trivial number of cycles in
> the common cases where it finds an existing eclass or is told not to
> make a new one. More subtly, the bitmapsets are computed in the caller's
> context. If we do use them, they will be attached to an EquivalenceClass
> that lives in the potentially-longer-lived root->planner_cxt, allowing
> the EC's pointers to them to become dangling. This would be a live bug
> if get_eclass_for_sort_expr() could be called with create_it = true during
> GEQO planning. So far as I can find, it is not; but both its API spec and
> its internal comments certainly give the impression that that's allowed.

hmm, yeah, it seems fairly low-risk to be moving that down to after we
switch the memory context. Seems like the sort of thing that could
easily become an actual bug one day.

> Second, generate_join_implied_equalities() uses inner_rel->relids to
> look up relevant eclasses, but given the surrounding code it seems like
> it ought to be using nominal_inner_relids. The code accidentally works
> because a child RelOptInfo will always have exactly the same
> eclass_indexes as its top parent; but if it did not, we'd risk either
> missing some relevant eclasses or hitting the assertion that claims
> that all the eclasses we find overlap nominal_join_relids. (I noticed
> this while speculating that maybe we needn't bother maintaining
> eclass_indexes for child RelOptInfos. This code is one place that
> would fail if we didn't.)

Oops. That certainly should be using nominal_inner_relids rather than
innerrel->relids. I agree it's not a live bug since child ECs are just
a copy of their parents, currently.

> I'm unsure whether to back-patch either of these. They both seem to be
> just latent bugs so far as the core code is concerned, but the first one
> in particular seems like something that could bite extension code.
> Thoughts?

That's a good question. I'm leaning towards backpatching both of them.
The 2nd is new as of v13, so it does not seem unreasonable that
someone has just not yet stumbled on it with some extension that adds
extra child ECs. I can imagine a use case for that, by getting rid of
needless equality quals that duplicate the partition constraint. The
fix for the first just seems neater/faster/correct, so I can't really
see any reason not to backpatch it.

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2020-10-04 21:20:01 Re: A modest proposal: let's add PID to assertion failure messages
Previous Message Tom Lane 2020-10-04 21:08:07 A modest proposal: let's add PID to assertion failure messages