Properly pathify the union planner

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Properly pathify the union planner
Date: 2023-11-01 23:42:51
Message-ID: CAApHDvpb_63XQodmxKUF8vb9M7CxyUyT4sWvEgqeQU-GB7QFoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The upper planner was pathified many years ago now. That was a large
chunk of work and because of that, the union planner was not properly
pathified in that effort. A small note was left in
recurse_set_operations() to mention about future work.

You can see this lack of pathification in make_union_unique() and
choose_hashed_setop(). There are heuristics in there to decide the
method to use instead of creating paths and letting add_path() decide
what's faster.

I've been working on improving this for the past few weeks and I'm not
quite as far along as I'd hoped, but what I have is perhaps worthy of
sharing. For now, I've only improved UNIONs.

A UNION plan can now look like:

# explain (costs off) select * from a union select * from a;
QUERY PLAN
---------------------------------------------------
Unique
-> Merge Append
Sort Key: a.a
-> Index Only Scan using a_pkey on a
-> Index Only Scan using a_pkey on a a_1

Previously we'd have only considered Append -> Hash Aggregate or via
Append -> Sort -> Unique

To make this work, the query_planner() needs to know about setops, so
I've passed those down via the standard_qp_extra struct so that we can
choose pathkeys for the setops.

One part that still needs work is the EquivalanceClass building.
Because we only build the final targetlist for the Append after
planning all the append child queries, I ended up having to populate
the EquivalanceClasses backwards, i.e children first. add_eq_member()
determines if you're passing a child member by checking if parent !=
NULL. Since I don't have a parent EquivalenceMember to pass,
em_is_child gets set wrongly, and that causes problems because
ec_has_const can get set to true when it shouldn't. This is a problem
as it can make a PathKey redundant when it isn't. I wonder if I'll
need to change the signature of add_eq_member() and add an "is_child"
bool to force the EM to be a child em... Needs more thought...

I've not worked on the creation of Incremental Sort paths yet, or done
any path plumbing work to have UNION consider Gather Merge -> Unique
on already sorted paths. I think to make similar improvements to
EXCEPT and INTERSECT we'd need a node executor node. Perhaps
nodeMergeAppendSetops.c which can be configured to do EXCEPT or
INTERSECT. It could also perhaps handle UNION too then we can use
that instead of a Merge Append -> Unique. That might save doing some
slot copying and improve performance further. I'm not planning on
doing that for the first stage. I only intend to improve UNION for
that and we have all the executor nodes to make that work already.

Anyway, I've attached my WIP patch for this.

David

Attachment Content-Type Size
better_union_planner_v0.patch text/plain 29.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xing Guo 2023-11-01 23:45:33 Re: Don't pass NULL pointer to strcmp().
Previous Message Bruce Momjian 2023-11-01 23:40:17 Re: document deviation from standard on REVOKE ROLE