Re: Self-join optimisation

From: Hywel Carver <hywel(at)skillerwhale(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Self-join optimisation
Date: 2021-03-11 14:39:55
Message-ID: CAFcA2FYVkYUT2tQd-5-SZXmj8mgzMW0JtiHLo+NK0RPAw20kfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Great! It looks like it's been in commitfest status for a few years. Is
there anything someone like me (outside the pgsql-hackers community) can do
to help it get reviewed this time around?

On Thu, Mar 11, 2021 at 2:32 PM Matthias van de Meent <
boekewurm+postgres(at)gmail(dot)com> wrote:

> On Thu, 11 Mar 2021 at 15:15, Hywel Carver <hywel(at)skillerwhale(dot)com> wrote:
> >
> > Hi,
> >
> > I asked this question in the Postgres Slack, and was recommended to ask
> here instead.
> >
> > A few times, I've been in a situation where I want to join a table to
> itself on its primary key. That typically happens because I have some kind
> of summary view, which I then want to join to the original table (using its
> primary key) to flesh out the summary data with other columns. That's
> executed as a join, which surprised me. But in this case, I could extend
> the view to have all of the columns of the original table to avoid the join.
> >
> > But there's another case that's harder to solve this way: combining
> views together. Here's a trivial example:
> >
> > CREATE TABLE users (id BIGINT PRIMARY KEY, varchar name);
> > CREATE VIEW only_some_users AS (SELECT * FROM users WHERE id < 10);
> > CREATE VIEW some_other_users AS (SELECT * FROM users WHERE id > 3);
> >
> > EXPLAIN SELECT * FROM only_some_users
> > INNER JOIN some_other_users ON only_some_users.id = some_other_users.id;
> >
> > Hash Join (cost=29.23..43.32 rows=90 width=144)
> > Hash Cond: (users.id = users_1.id)
> > -> Bitmap Heap Scan on users (cost=6.24..19.62 rows=270 width=72)
> > Recheck Cond: (id < 10)
> > -> Bitmap Index Scan on users_pkey (cost=0.00..6.18 rows=270
> width=0)
> > Index Cond: (id < 10)
> > -> Hash (cost=19.62..19.62 rows=270 width=72)
> > -> Bitmap Heap Scan on users users_1 (cost=6.24..19.62
> rows=270 width=72)
> > Recheck Cond: (id > 3)
> > -> Bitmap Index Scan on users_pkey (cost=0.00..6.18
> rows=270 width=0)
> > Index Cond: (id > 3)
> >
> > Is there a reason why Postgres doesn't have an optimisation built in to
> optimise this JOIN? What I'm imagining is that a join between two aliases
> for the same table on its primary key could be optimised by treating them
> as the same table. I think the same would be true for self-joins on any
> non-null columns covered by a uniqueness constraint.
> >
> > If this is considered a desirable change, I'd be keen to work on it
> (with some guidance).
>
> There's currently a patch registered in the commitfest that could fix
> this for you, called "Remove self join on a unique column" [0].
>
>
> With regards,
>
> Matthias van de Meent
>
> [0] https://commitfest.postgresql.org/31/1712/, thread at
>
> https://www.postgresql.org/message-id/flat/64486b0b-0404-e39e-322d-0801154901f3(at)postgrespro(dot)ru
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-03-11 14:56:05 Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation
Previous Message Fujii Masao 2021-03-11 14:33:40 Re: About to add WAL write/fsync statistics to pg_stat_wal view