Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)

From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Nasby <jim(dot)nasby(at)openscg(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [HACKERS] Re: Improve OR conditions on joined columns (common star schema problem)
Date: 2018-09-04 01:59:10
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

On Sun, Feb 12, 2017 at 09:32:36AM -0800, Tom Lane wrote:
> It's not so much poor choices as the cost of the optimization attempt ---
> if there's a K-relation OR clause, this will increase the cost of planning
> by a factor approaching K+1, whether or not you get a better plan out of
> it. I ran the regression tests with some instrumentation and determined
> that this logic fires a dozen or two times, and fails to produce a plan
> that looks cheaper than the standard plan in any of those cases. So if we
> go down this road, not only do we need a GUC but I suspect it had better
> default to off; only people using star schemas are really likely to get a
> win out of it.

I benchmarked using the attached script. Highlights:

$ perl --dimensions=11 --tests=const | psql -X
# TRAP: FailedAssertion("!(uniq_exprs != ((List *) ((void *)0)))", File: "planunionor.c", Line: 335)

$ perl --dimensions=11 --tests=var --exhaustive --values-per-dimension=0 | psql -X
# TRAP: FailedAssertion("!(list_length(dest_tlist) == list_length(src_tlist))", File: "tlist.c", Line: 344)

$ perl --dimensions=7 --or-clauses=20 --exhaustive --tests=const,rowmark | psql -X
... (with planunionor.c plan)
Planning Time: 1902.013 ms
Execution Time: 291.629 ms
... (without planunionor.c plan)
Planning Time: 11.100 ms
Execution Time: 64.452 ms
# planning 170x slower, chosen plan 4.5x slower

I agree this warrants a GUC, but I propose a goal of making it fitting to
enable by default. The is_suitable_join_or_clause() test is a good indicator
of promising queries, and I suspect it's cheap enough to run all the time. As
a DBA, I'd struggle to predict when is_suitable_join_or_clause() will pass
while the optimization as a whole will lose; I would resort to testing each
important query both ways. In other words, the GUC would boil down to a
planner hint, not to a declaration about the table schema.

On Thu, Aug 23, 2018 at 02:10:46PM -0400, Tom Lane wrote:
> Rebased up to HEAD, per cfbot nagging. Still no substantive change from
> v2.

> + * is retty mechanical, but we can't do it until we have a RelOptInfo for the

Jim Nasby had pointed out this s/retty/pretty/ typo.

> + void
> + finish_union_or_paths(PlannerInfo *root, RelOptInfo *joinrel,
> + List *union_or_subpaths)
> + {
> + pathnode = makeNode(UniquePath);
> + /* Estimate number of output rows */
> + pathnode->path.rows = appendpath->rows;

If you're going to keep this highly-simplified estimate, please expand the
comment to say why it doesn't matter or what makes it hard to do better. The
non-planunionor.c path for the same query computes its own estimate of the
same underlying quantity. Though it may be too difficult in today's system,
one could copy the competing path's row count estimate here. Perhaps it
doesn't matter because higher-level processing already assumes equal row count
among competitors?

Attachment Content-Type Size application/x-perl 2.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-09-04 03:42:26 Re: pg_verify_checksums failure with hash indexes
Previous Message Amit Langote 2018-09-04 01:23:19 Re: pointless check in RelationBuildPartitionDesc