Implement hook for self-join simplification

From: Leif Harald Karlsen <leifhka(at)ifi(dot)uio(dot)no>
To: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Implement hook for self-join simplification
Date: 2022-06-24 13:58:43
Message-ID: 9753d819f85b45d9943a0df37723ae3a@ifi.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I have a made a small deductive database on top of PostgreSQL for educational/research purposes. In this setting, due to certain VIEW-constructions, queries often end up being self-joins on primary keys, e.g.:

SELECT t1.id, t2.val

FROM t AS t1 JOIN t AS t2 USING (id);

where t(id) is a primary key. This query is equivalent to the much more efficient:

SELECT id, val

FROM t AS t1;

However, PostgreSQL currently does not seem to implement this simplification. Therefore, I have looked into writing an extension that performs this, but I am struggling a bit with finding out when this simplification should be done, i.e. which hook I should implement.

The simplification is not too different from those done in prep/prepjoin.c, i.e. doing the simplification on the query-tree directly. However, I think I then would need to implement a planner_hook, as it is the only hook giving me direct access to the query-tree. But I need to perform my simplification after view-definitions have been expanded into the query, and after the transformations in prepjoin.c (but before the rest of planning). But there seems to be no easy way to inject a function there, as this is buried deep in the middle of the planner-function.

I therefore looked into using a set_join_pathlist_hook, and try to do the simplification at path-level. I.e., doing something like:

static void self_join_optimize_hook(PlannerInfo *root, RelOptInfo* joinrel, RelOptInfo* outerrel, RelOptInfo* innerrel, JoinType jointype, JoinPathExtraData* extra)

{

if (is_selfjoin_on_pk(root, joinrel, extra)) {

ListCell *p;

foreach(p, innerrel->pathlist) {

add_path(joinrel, (Path *) p);

}

}

}

That is, if joinrel is a (binary) self-join on a primary key, the paths for evaluating the join is the same as the paths for evaluating the innerrel, However, this does not work, as the rest of the query may require values from the other table (e.g. t2 in the example above). I therefore need to replace all mentions of t2 with t1, but is this possible at a path-level?

If not, does anyone have a an idea on how this can be done in a different way? Thanks!

Kind regards,

Leif Harald Karlsen

Senior Lecturer

Department of Informatics

University of Oslo

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-06-24 14:06:53 Re: array_cat anycompatible change is breaking xversion upgrade tests
Previous Message Justin Pryzby 2022-06-24 13:30:53 Re: array_cat anycompatible change is breaking xversion upgrade tests