Re: Performance issue in foreign-key-aware join estimation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: Performance issue in foreign-key-aware join estimation
Date: 2019-03-17 21:08:12
Message-ID: 15112.1552856892@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> [ eclass_indexes_v3.patch ]

I looked at this for a little bit. I'm on board with the basic idea
of assigning integer indexes to the ECs and keeping bitmapsets of
relevant EC indexes in RelOptInfos. However ... man, is that
delete_eclass() thing ugly. And slow, and fragile-feeling.

I think that maybe we could improve that situation by not trying to
maintain the RelOptInfo.eclass_indexes sets at all during the initial
construction of the ECs, but only setting them up after we've completed
doing that. We already have an assumption that we know when EC merging
is done and it's safe to make pathkeys that reference the "canonical"
(merged) ECs, so it seems like that would be a point where we could
build the indexing bitmapsets safely. This hinges on not needing the
bitmapsets till after that point, but it'd be easy to arrange some
Asserts verifying that we don't try to use them before that.

If that doesn't work (because we need the index info sooner), maybe we
could consider never removing ECs from eq_classes, so that their indices
never change, then teaching most/all of the loops over eq_classes to
ignore entries with non-null ec_merged. But we probably want the indexing
bitmapsets to reflect canonical EC numbers, so we'd still have to do some
updating I fear -- or else be prepared to chase up the ec_merged links
when using the index bitmaps.

Stepping back a little bit, I wonder whether now is the time to rethink
the overall EC data structure, as foreseen in this old comment:

* Note: constructing merged EquivalenceClasses is a standard UNION-FIND
* problem, for which there exist better data structures than simple lists.
* If this code ever proves to be a bottleneck then it could be sped up ---
* but for now, simple is beautiful.

I'm not very sure what a better data structure would really look like ---
after perusing Sedgewick's section on UNION-FIND, it seems like the
ec_merged links are more or less what he's recommending anyway, and the
expensive part of this is probably all the equal() tests to find whether
two proposed EC member expressions are already in the set of known
expressions. So I'm just handwaving here. But my point is that we
needn't feel wedded to the idea of keeping the ECs in a list, if we can
think of some better data structure for them.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chapman Flack 2019-03-17 22:31:00 Re: Fix XML handling with DOCTYPE
Previous Message Andrew Gierth 2019-03-17 20:38:53 Re: CREATE OR REPLACE AGGREGATE?