Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Nasby <jim(dot)nasby(at)openscg(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Date: 2018-03-30 02:05:27
Message-ID: 20180330020527.hllsqse2vwqldfa5@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've only skimmed the thread, looking at the patch on its own.

On 2018-01-04 17:50:48 -0500, Tom Lane wrote:
> diff --git a/src/backend/optimizer/plan/plaindex ...dd11e72 .
> --- a/src/backend/optimizer/plan/planunionor.c
> +++ b/src/backend/optimizer/plan/planunionor.c
> @@ -0,0 +1,667 @@
> +/*-------------------------------------------------------------------------
> + *
> + * planunionor.c
> + * Consider whether join OR clauses can be converted to UNION queries.
> + *
> + * The current implementation of the UNION step is to de-duplicate using
> + * row CTIDs.

Could we skip using the ctid if there's a DISTINCT (or something to that
effect) above? We do not need to avoid removing rows that are identical
if that's done anyway.

> A big limitation is that this only works on plain relations,
> + * and not for instance on foreign tables. Another problem is that we can
> + * only de-duplicate by sort/unique, not hashing; but that could be fixed
> + * if we write a hash opclass for TID.

I wonder if an alternative could be some sort of rowid that we invent.
It'd not be that hard to introduce an executor node (or do it in
projection) that simply counts row and returns that as a
column. Together with e.g. range table id that'd be unique. But for that
we would need to guarantee that foreign tables / subqueries /
... returned the same result in two scans. We could do so by pushing
the data gathering into a CTE, but that'd make this exercise moot.

Why can't we ask at least FDWs to return something ctid like?

> + * To allow join removal to happen, we can't reference the CTID column
> + * of an otherwise-removable relation.

A brief hint why wouldn't hurt.

> +/*
> + * Is query as a whole safe to apply union OR transformation to?
> + * This checks relatively-expensive conditions that we don't want to
> + * worry about until we've found a candidate OR clause.
> + */
> +static bool
> +is_query_safe_for_union_or_transform(PlannerInfo *root)
> +{
> + Query *parse = root->parse;
> + Relids allbaserels;
> + ListCell *lc;
> + int relid;
> +
> + /*
> + * Must not have any volatile functions in FROM or WHERE (see notes at
> + * head of file).
> + */
> + if (contain_volatile_functions((Node *) parse->jointree))
> + return false;

Hm, are there any SRFs that could be in relevant places? I think we
reject them everywhere were they'd be problematic (as targetlist is
processed above)?

Do you have any plans for this patch at this moment?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Edmund Horner 2018-03-30 02:18:44 Re: pgbench doc typos
Previous Message Andres Freund 2018-03-30 01:51:45 Re: Protect syscache from bloating with negative cache entries