Re: Removing unneeded self joins

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Removing unneeded self joins
Date: 2018-05-16 23:00:53
Message-ID: 20180516230053.upq62j47n6i543tf@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-05-16 18:55:41 -0400, Tom Lane wrote:
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > On 17 May 2018 at 10:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Yeah. It'd have to be a very heuristic thing that doesn't account
> >> for much beyond the number of relations in the query, and maybe their
> >> sizes --- although I don't think we even know the latter at the
> >> point where join removal would be desirable. (And note that one of
> >> the desirable benefits of join removal is not having to find out the
> >> sizes of removed rels ... so just swapping that around doesn't appeal.)
>
> > There's probably some argument for delaying obtaining the relation
> > size until after join removal and probably partition pruning too, but
> > it's currently done well before that in build_simple_rel, where the
> > RelOptInfo is built.
>
> Yeah, but that's something we ought to fix someday; IMO it's an artifact
> of having wedged in remove_useless_joins without doing the extensive
> refactoring that'd be needed to do it at a more desirable time. I don't
> want to build user-visible behavior that's dependent on doing that wrong.

My patch that introduced a radix tree buffer mapping also keeps an
accurate relation size in memory, making it far cheaper to use. While I
depriorized the patchset for the moment (I'll post what I'm working on
first soon), that should address some of the cost till then.

Wonder if we shouldn't just cache an estimated relation size in the
relcache entry till then. For planning purposes we don't need to be
accurate, and usually activity that drastically expands relation size
will trigger relcache activity before long. Currently there's plenty
workloads where the lseeks(SEEK_END) show up pretty prominently.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2018-05-17 00:13:21 Re: Postgres 11 release notes
Previous Message Tom Lane 2018-05-16 22:55:41 Re: Removing unneeded self joins