Re: Parameter for planner estimate of recursive queries

From: Kenaniah Cerny <kenaniah(at)gmail(dot)com>
To: Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parameter for planner estimate of recursive queries
Date: 2022-01-22 22:33:59
Message-ID: CA+r_aq86dwdPSm740BN_8i2e2oTLYQb1uUTpnTr1JrQX7cApfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The factor 10 should not be hardcoded in the planner, but should be
settable, just as cursor_tuple_fraction is.

I feel considerably out of my depth here, but I like the idea of a working
table size multiplier GUC, given the challenges of predicting the number of
iterations (and any adjustments to cardinality per iteration). An
exponential cost function may lead to increasingly pathological outcomes,
especially when estimates for cte_rows are off.

In the EXPLAINs, it looked like the estimates for knows_pkey were off by an
order of magnitude or so. It's possible the planner would have chosen the
Nested Loop plan if knows_pkey had estimated to rows=87 (as the WindowAgg
would have estimated to roughly the same size as the second plan anyways,
even with rel->tuples = 10 * cte_rows).

I also wonder if there's a better default than cte_rows * 10, but
implementing a new GUC sounds like a reasonable solution to this as well.

--

Kenaniah

On Sat, Jan 22, 2022 at 1:58 PM Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
wrote:

> On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon(dot)riggs(at)enterprisedb(dot)com>
> wrote:
>
> > The poor performance is traced to the planner cost estimates for
> > recursive queries. Specifically, the cost of the recursive arm of the
> > query is evaluated based upon both of these hardcoded assumptions:
> >
> > 1. The recursion will last for 10 loops
> > 2. The average size of the worktable will be 10x the size of the
> > initial query (non-recursive term).
> >
> > Taken together these assumptions lead to a very poor estimate of the
> > worktable activity (in this case), which leads to the plan changing as
> > a result.
> >
> > The factor 10 is a reasonably safe assumption and helps avoid worst
> > case behavior in bigger graph queries. However, the factor 10 is way
> > too large for many types of graph query, such as where the path
> > through the data is tight, and/or the query is written to prune bushy
> > graphs, e.g. shortest path queries. The factor 10 should not be
> > hardcoded in the planner, but should be settable, just as
> > cursor_tuple_fraction is.
>
> If you think this should be derived without parameters, then we would
> want a function that starts at 1 for 1 input row and gets much larger
> for larger input. The thinking here is that Graph OLTP is often a
> shortest path between two nodes, whereas Graph Analytics and so the
> worktable will get much bigger.
>
> So I'm, thinking we use
>
> rel->tuples = min(1, cte_rows * cte_rows/2);
>
> --
> Simon Riggs http://www.EnterpriseDB.com/
>
>
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-01-22 22:34:16 Re: Warning in geqo_main.c from clang 13
Previous Message Dmitry Dolgov 2022-01-22 21:37:19 Re: MDAM techniques and Index Skip Scan patch