Re: Removing unneeded self joins

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2019-02-22 00:25:12
Message-ID: 22996.1550795112@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru> writes:
> Here is a rebased version with some bugfixes.

I noticed this had bit-rotted again. I've not really reviewed it, but
I rebased it up to HEAD, and fixed a couple small things:

* My compiler was bitching about misplaced declarations, so I moved
some variable declarations accordingly. I couldn't help noticing
that many of those wouldn't have been a problem in the first place
if you were following project style for loops around list_delete_cell
calls, which usually look more like this:

prev = NULL;
for (cell = list_head(root->rowMarks); cell; cell = next)
{
PlanRowMark *rc = (PlanRowMark *) lfirst(cell);

next = lnext(cell);
if (rt_fetch(rc->rti, root->parse->rtable)->rtekind == RTE_RESULT)
root->rowMarks = list_delete_cell(root->rowMarks, cell, prev);
else
prev = cell;
}

* I saw you had a problem with an existing test in join.sql that
was being optimized away because it used an ill-advised self-join.
I've pushed a fix for that, so it's not a problem as of HEAD.

I notice though that there's one unexplained plan change remaining
in join.out:

@@ -4365,11 +4365,13 @@ explain (costs off)
select p.* from
(parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
where p.k = 1 and p.k = 2;
- QUERY PLAN
---------------------------
+ QUERY PLAN
+------------------------------------------------
Result
One-Time Filter: false
-(2 rows)
+ -> Index Scan using parent_pkey on parent x
+ Index Cond: (k = 1)
+(4 rows)

-- bug 5255: this is not optimizable by join removal
begin;

That sure looks like a bug. I don't have time to look for the
cause right now.

I also noticed that the test results show that when a table
is successfully optimized away, the remaining reference seems
to have the alias of the second reference not the first one.
That seems a little ... weird. It's just cosmetic of course, but
why is that?

Also, I did notice that you'd stuck a declaration for
"struct UniqueIndexInfo" into paths.h, which then compelled you
to include that header in planmain.h. This seems like poor style;
I'd have been inclined to put the struct in pathnodes.h instead.
That's assuming you need it at all -- in those two usages, seems
like it'd be just about as easy to return two separate Lists.
On the other hand, given

+ * unique_for_rels - list of (Relids, UniqueIndexInfo*) lists, where Relids
+ * is a set of other rels for which this one has been proven
+ * unique, and UniqueIndexInfo* stores information about the
+ * index that makes it unique, if any.

I wonder why you didn't include the Relids into UniqueIndexInfo as well
... and maybe make it a proper Node so that unique_for_rels could be
printed by outfuncs.c. So any way I slice it, it seems like this data
structure could use more careful contemplation.

Anyway, updated patch attached.

regards, tom lane

Attachment Content-Type Size
0001-Remove-self-joins-v10.patch text/x-diff 58.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jamison, Kirk 2019-02-22 00:30:37 RE: Timeout parameters
Previous Message Bruce Momjian 2019-02-22 00:13:13 Re: boolean and bool in documentation