From: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: estimation problems for DISTINCT ON with FDW |
Date: | 2020-07-01 11:06:04 |
Message-ID: | CAPmGK15_kkoqKaT_va5WX1q41F=6xGOZGQdAivYg1QF7nJ7uhw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Attached are a couple of quick-hack patches along each of those lines.
> Either one resolves the crazy number-of-groups estimate for Jeff's
> example; neither changes any existing regression test results.
>
> On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
> Sure, it makes that function "more robust", but it does so at the cost
> of believing what might be a default or otherwise pretty insane
> reldistinct estimate. We put in the clamping behavior for a reason,
> and I'm not sure we should disable it just because reltuples = 0.
>
> 0002 seems like a better answer on the whole, but it has a pretty
> significant issue as well: it's changing the API for FDW
> GetForeignRelSize functions, because now we're expecting them to set
> both rows and tuples to something sane, contrary to the existing docs.
postgres_fdw already sets both rows and tuples if use_remote_estimate
is false, and we have pages=0 and tuples=0, so the contrary seems OK
to me.
In the 0002 patch:
+ /*
+ * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+ * If the foreign table has never been ANALYZEd, or if its stats are
+ * out of date, baserel->tuples might now be less than baserel->rows,
+ * which will confuse assorted logic. Hack it to appear minimally
+ * sensible. (Do we need to hack baserel->pages too?)
+ */
+ baserel->tuples = Max(baserel->tuples, baserel->rows);
for consistency, this should be
baserel->tuples = clamp_row_est(baserel->rows / sel);
where sel is the selectivity of the baserestrictinfo clauses?
> What I'm sort of inclined to do is neither of these exactly, but
> instead put the
>
> baserel->tuples = Max(baserel->tuples, baserel->rows);
>
> clamping behavior into the core code, immediately after the call to
> GetForeignRelSize. This'd still let the FDW set baserel->tuples if
> it has a mind to, while not requiring that; and it prevents the
> situation where the rows and tuples estimates are inconsistent.
I'm not sure this would address the inconsistency. Consider the
postgres_fdw case where use_remote_estimate is true, and the stats are
out of date, eg, baserel->tuples copied from pg_class is much larger
than the actual tuples and hence baserel->rows (I assume here that
postgres_fdw doesn't do anything about baserel->tuples). In such a
case the inconsistency would make the estimate_num_groups() estimate
more inaccurate. I think the consistency is the responsibility of the
FDW rather than the core, so I would vote for the 0002 patch. Maybe
I'm missing something.
Thanks for working on this!
Best regards,
Etsuro Fujita
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2020-07-01 11:19:40 | Re: Additional improvements to extended statistics |
Previous Message | Bharath Rupireddy | 2020-07-01 10:59:34 | Re: Cleanup - adjust the code crossing 80-column window limit |