Re: Path question

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hans-Jürgen Schönig <hs(at)cybertec(dot)at>, Greg Stark <gsstark(at)mit(dot)edu>, Boszormenyi Zoltan <zb(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Path question
Date: 2010-09-29 15:01:35
Message-ID: AANLkTikGXLquVY8HKU27Vb7v8OxboA66+A+bf==3NrrO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 28, 2010 at 11:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> ...what makes the pathkeys potentially useful is that they match the
>> root pathkeys for the query.  However, without Greg's hacks, the
>> transposed child equivalence classes don't show up in the equivalence
>> member lists, so get_eclass_for_sort_expr() generates new equivalence
>> classes for them.  Subsequently, when truncate_useless_pathkeys() is
>> called, those new equivalence classes are found not to be equal to the
>> overall ordering desired for the query, so it truncates them away.
>
>> I'm not presently sure quite what the best way to fix this is... any ideas?
>
> Probably what's needed is to extend the "child eclass member" stuff to
> cover sort-key eclasses.  Per relation.h:
>
>  * em_is_child signifies that this element was built by transposing a member
>  * for an inheritance parent relation to represent the corresponding expression
>  * on an inheritance child.  The element should be ignored for all purposes
>  * except constructing inner-indexscan paths for the child relation.
>
> Likely these need to be ignored a bit less.  A couple of Greg's
> undocumented hacks seem to be related to this point, but not all of
> them.  In any case you'd need some careful thought about when to
> ignore child EMs and when not.

Makes sense to me. It seems that there are actually two halves to
this problem: getting the child EMs to be generated in the first
place, and then getting them to be examined at the appropriate time.

The FIXME in set_append_rel_pathlist() is there to ensure that
add_child_rel_equivalences() always gets called, and the hack in
add_child_rel_equivalences() is there to ensure that child EMs are
generated unconditionally rather than only when they're useful for
merging. That doesn't seem entirely off-base. In
set_append_rel_pathlist(), I think that we shouldn't set
has_eclass_joins on the child relation unless it's set on the parent,
but calling add_child_rel_equivalences() unconditionally might be
reasonable. Similarly, in add_child_rel_equivalences(), I think that
the cur_ec->ec_has_const test should be retained, but the test on
list_length(cur_ec->ec_members) needs to be relaxed or eliminated. As
a matter of correctness, it seems like it should be OK to do this
always - the worst thing that can happen is you end up with some extra
EMs that don't (or shouldn't) do anything. However, it's possible
that, for performance, we might want to avoid generating EMs that
won't actually be needed. I'm not entirely clear on whether there is
an inexpensive way for us to know that. Perhaps we could replace the
list_length() test with a call to has_useful_pathkeys() and just
accept that there will be some false positives.

The two FIXMEs in make_sort_from_pathkeys() are there to ensure that
we find the child EMs when we go to generate a sort. There's no
reason that I can see to remove the em_has_const test, but it looks
like we need to remove the em_is_child test. We need to match each
pathkey to an element of the child's target list, which means we must
consider an EM that appears in the child's target list, which is means
a child EM or nothing. Testing reveals that if you keep Greg's hacks
to add the child EMs but remove the hacks from
make_sort_from_pathkeys(), it still works IF every child can use an
index scan, but if you need to do an index scan on some children and
sort others then you get:

ERROR: could not find pathkey item to sort

...which seems consistent with the above analysis. If we accept that
it's necessary, that still leaves the question of whether it's safe.
I'm a little less certain on this point, but it seems like it ought to
be OK. As far as I'm aware, we currently never sort append children.
Indeed, unless equivalence classes for those append children were
getting created by some mechanism other than
add_child_rel_equivalences(), it seems like it ought to fail with the
above error messages. And on the flip side it should be impossible
for us to pick up a child EM when we meant to get some other one
because they shouldn't be equal().

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-09-29 15:02:47 Re: recovery.conf location
Previous Message Tom Lane 2010-09-29 14:56:13 Re: operator dependency of commutator and negator